Report Generation


Advancing ESG Intelligence: An Expert-level Agent and Comprehensive Benchmark for Sustainable Finance

Add code
Jan 13, 2026
Viaarxiv icon

Route, Retrieve, Reflect, Repair: Self-Improving Agentic Framework for Visual Detection and Linguistic Reasoning in Medical Imaging

Add code
Jan 13, 2026
Viaarxiv icon

MirrorBench: An Extensible Framework to Evaluate User-Proxy Agents for Human-Likeness

Add code
Jan 13, 2026
Viaarxiv icon

DeepResearch Bench II: Diagnosing Deep Research Agents via Rubrics from Expert Report

Add code
Jan 13, 2026
Viaarxiv icon

GI-Bench: A Panoramic Benchmark Revealing the Knowledge-Experience Dissociation of Multimodal Large Language Models in Gastrointestinal Endoscopy Against Clinical Standards

Add code
Jan 13, 2026
Viaarxiv icon

When KV Cache Reuse Fails in Multi-Agent Systems: Cross-Candidate Interaction is Crucial for LLM Judges

Add code
Jan 13, 2026
Viaarxiv icon

Integrating Machine-Generated Short Descriptions into the Wikipedia Android App: A Pilot Deployment of Descartes

Add code
Jan 12, 2026
Viaarxiv icon

Get away with less: Need of source side data curation to build parallel corpus for low resource Machine Translation

Add code
Jan 13, 2026
Viaarxiv icon

Explaining Generalization of AI-Generated Text Detectors Through Linguistic Analysis

Add code
Jan 12, 2026
Viaarxiv icon

VLM-CAD: VLM-Optimized Collaborative Agent Design Workflow for Analog Circuit Sizing

Add code
Jan 12, 2026
Viaarxiv icon