Picture for Lin Qiu

Lin Qiu

Paul G. Allen School of Computer Science & Engineering, University of Washington, United States

Asuka-Bench: Benchmarking Code Agents on Underspecified User Intent and Multi-Round Refinement

Add code
Jun 04, 2026
Viaarxiv icon

DUEL: Adversarial Self-Play for Multimodal Reasoning

Add code
May 24, 2026
Viaarxiv icon

LongCat-Next: Lexicalizing Modalities as Discrete Tokens

Add code
Mar 29, 2026
Viaarxiv icon

AMemGym: Interactive Memory Benchmarking for Assistants in Long-Horizon Conversations

Add code
Mar 02, 2026
Viaarxiv icon

Beyond Instrumental and Substitutive Paradigms: Introducing Machine Culture as an Emergent Phenomenon in Large Language Models

Add code
Jan 23, 2026
Viaarxiv icon

LongCat-Flash-Thinking-2601 Technical Report

Add code
Jan 23, 2026
Viaarxiv icon

CATArena: Evaluation of LLM Agents through Iterative Tournament Competitions

Add code
Oct 30, 2025
Figure 1 for CATArena: Evaluation of LLM Agents through Iterative Tournament Competitions
Figure 2 for CATArena: Evaluation of LLM Agents through Iterative Tournament Competitions
Figure 3 for CATArena: Evaluation of LLM Agents through Iterative Tournament Competitions
Figure 4 for CATArena: Evaluation of LLM Agents through Iterative Tournament Competitions
Viaarxiv icon

Instance-level Randomization: Toward More Stable LLM Evaluations

Add code
Sep 16, 2025
Figure 1 for Instance-level Randomization: Toward More Stable LLM Evaluations
Figure 2 for Instance-level Randomization: Toward More Stable LLM Evaluations
Figure 3 for Instance-level Randomization: Toward More Stable LLM Evaluations
Figure 4 for Instance-level Randomization: Toward More Stable LLM Evaluations
Viaarxiv icon

OIBench: Benchmarking Strong Reasoning Models with Olympiad in Informatics

Add code
Jun 12, 2025
Figure 1 for OIBench: Benchmarking Strong Reasoning Models with Olympiad in Informatics
Figure 2 for OIBench: Benchmarking Strong Reasoning Models with Olympiad in Informatics
Figure 3 for OIBench: Benchmarking Strong Reasoning Models with Olympiad in Informatics
Figure 4 for OIBench: Benchmarking Strong Reasoning Models with Olympiad in Informatics
Viaarxiv icon

Audio Turing Test: Benchmarking the Human-likeness of Large Language Model-based Text-to-Speech Systems in Chinese

Add code
May 16, 2025
Viaarxiv icon