Picture for Sixiong Xie

Sixiong Xie

SGR-Bench: Benchmarking Search Agents on State-Gated Retrieval

Add code
May 21, 2026
Viaarxiv icon

MindLoom: Composing Thought Modes for Frontier-Level Reasoning Data Synthesis

Add code
May 20, 2026
Viaarxiv icon

DeepWeb-Bench: A Deep Research Benchmark Demanding Massive Cross-Source Evidence and Long-Horizon Derivation

Add code
May 20, 2026
Viaarxiv icon

ViDR: Grounding Multimodal Deep Research Reports in Source Visual Evidence

Add code
May 13, 2026
Viaarxiv icon

M3-BENCH: Process-Aware Evaluation of LLM Agents Social Behaviors in Mixed-Motive Games

Add code
Jan 13, 2026
Viaarxiv icon