Picture for Xinyu Fang

Xinyu Fang

Beyond Mode Collapse: Distribution Matching for Diverse Reasoning

Add code
May 19, 2026
Viaarxiv icon

WildClawBench: A Benchmark for Real-World, Long-Horizon Agent Evaluation

Add code
May 11, 2026
Viaarxiv icon

On the Role of Language Representations in Auto-Bidding: Findings and Implications

Add code
May 07, 2026
Viaarxiv icon

Introspective Diffusion Language Models

Add code
Apr 13, 2026
Viaarxiv icon

Intern-S1-Pro: Scientific Multimodal Foundation Model at Trillion Scale

Add code
Mar 26, 2026
Viaarxiv icon

Visual-ERM: Reward Modeling for Visual Equivalence

Add code
Mar 13, 2026
Viaarxiv icon

PhysicsMind: Sim and Real Mechanics Benchmarking for Physical Reasoning and Prediction in Foundational VLMs and World Models

Add code
Jan 22, 2026
Viaarxiv icon

ATLAS: A High-Difficulty, Multidisciplinary Benchmark for Frontier Scientific Reasoning

Add code
Nov 18, 2025
Viaarxiv icon

OPT-BENCH: Evaluating LLM Agent on Large-Scale Search Spaces Optimization Problems

Add code
Jun 12, 2025
Figure 1 for OPT-BENCH: Evaluating LLM Agent on Large-Scale Search Spaces Optimization Problems
Figure 2 for OPT-BENCH: Evaluating LLM Agent on Large-Scale Search Spaces Optimization Problems
Figure 3 for OPT-BENCH: Evaluating LLM Agent on Large-Scale Search Spaces Optimization Problems
Figure 4 for OPT-BENCH: Evaluating LLM Agent on Large-Scale Search Spaces Optimization Problems
Viaarxiv icon

Creation-MMBench: Assessing Context-Aware Creative Intelligence in MLLM

Add code
Mar 19, 2025
Figure 1 for Creation-MMBench: Assessing Context-Aware Creative Intelligence in MLLM
Figure 2 for Creation-MMBench: Assessing Context-Aware Creative Intelligence in MLLM
Figure 3 for Creation-MMBench: Assessing Context-Aware Creative Intelligence in MLLM
Figure 4 for Creation-MMBench: Assessing Context-Aware Creative Intelligence in MLLM
Viaarxiv icon