Picture for Qianqian Xie

Qianqian Xie

DR$^{3}$-Eval: Towards Realistic and Reproducible Deep Research Evaluation

Add code
Apr 16, 2026
Viaarxiv icon

SiMing-Bench: Evaluating Procedural Correctness from Continuous Interactions in Clinical Skill Videos

Add code
Apr 10, 2026
Viaarxiv icon

TaxPraBen: A Scalable Benchmark for Structured Evaluation of LLMs in Chinese Real-World Tax Practice

Add code
Apr 10, 2026
Viaarxiv icon

Appear2Meaning: A Cross-Cultural Benchmark for Structured Cultural Metadata Inference from Images

Add code
Apr 08, 2026
Viaarxiv icon

Credibility Governance: A Social Mechanism for Collective Self-Correction under Weak Truth Signals

Add code
Mar 03, 2026
Viaarxiv icon

EHRNavigator: A Multi-Agent System for Patient-Level Clinical Question Answering over Heterogeneous Electronic Health Records

Add code
Jan 15, 2026
Viaarxiv icon

RAAR: Retrieval Augmented Agentic Reasoning for Cross-Domain Misinformation Detection

Add code
Jan 08, 2026
Viaarxiv icon

MisSpans: Fine-Grained False Span Identification in Cross-Domain Fake News

Add code
Jan 08, 2026
Viaarxiv icon

MentraSuite: Post-Training Large Language Models for Mental Health Reasoning and Assessment

Add code
Dec 16, 2025
Viaarxiv icon

Human or LLM as Standardized Patients? A Comparative Study for Medical Education

Add code
Nov 12, 2025
Viaarxiv icon