Picture for Quan Shi

Quan Shi

$τ$-Knowledge: Evaluating Conversational Agents over Unstructured Knowledge

Add code
Mar 04, 2026
Viaarxiv icon

Hallucination as a Computational Boundary: A Hierarchy of Inevitability and the Oracle Escape

Add code
Aug 10, 2025
Figure 1 for Hallucination as a Computational Boundary: A Hierarchy of Inevitability and the Oracle Escape
Figure 2 for Hallucination as a Computational Boundary: A Hierarchy of Inevitability and the Oracle Escape
Figure 3 for Hallucination as a Computational Boundary: A Hierarchy of Inevitability and the Oracle Escape
Figure 4 for Hallucination as a Computational Boundary: A Hierarchy of Inevitability and the Oracle Escape
Viaarxiv icon

Multi-Agent Synergy-Driven Iterative Visual Narrative Synthesis

Add code
Jul 17, 2025
Viaarxiv icon

When Models Know More Than They Can Explain: Quantifying Knowledge Transfer in Human-AI Collaboration

Add code
Jun 05, 2025
Figure 1 for When Models Know More Than They Can Explain: Quantifying Knowledge Transfer in Human-AI Collaboration
Figure 2 for When Models Know More Than They Can Explain: Quantifying Knowledge Transfer in Human-AI Collaboration
Figure 3 for When Models Know More Than They Can Explain: Quantifying Knowledge Transfer in Human-AI Collaboration
Figure 4 for When Models Know More Than They Can Explain: Quantifying Knowledge Transfer in Human-AI Collaboration
Viaarxiv icon

LoKI: Low-damage Knowledge Implanting of Large Language Models

Add code
May 28, 2025
Viaarxiv icon

IMPersona: Evaluating Individual Level LM Impersonation

Add code
Apr 08, 2025
Figure 1 for IMPersona: Evaluating Individual Level LM Impersonation
Figure 2 for IMPersona: Evaluating Individual Level LM Impersonation
Figure 3 for IMPersona: Evaluating Individual Level LM Impersonation
Figure 4 for IMPersona: Evaluating Individual Level LM Impersonation
Viaarxiv icon

Atom of Thoughts for Markov LLM Test-Time Scaling

Add code
Feb 17, 2025
Viaarxiv icon

BRIGHT: A Realistic and Challenging Benchmark for Reasoning-Intensive Retrieval

Add code
Jul 16, 2024
Figure 1 for BRIGHT: A Realistic and Challenging Benchmark for Reasoning-Intensive Retrieval
Figure 2 for BRIGHT: A Realistic and Challenging Benchmark for Reasoning-Intensive Retrieval
Figure 3 for BRIGHT: A Realistic and Challenging Benchmark for Reasoning-Intensive Retrieval
Figure 4 for BRIGHT: A Realistic and Challenging Benchmark for Reasoning-Intensive Retrieval
Viaarxiv icon

A Multimodal Dangerous State Recognition and Early Warning System for Elderly with Intermittent Dementia

Add code
May 30, 2024
Figure 1 for A Multimodal Dangerous State Recognition and Early Warning System for Elderly with Intermittent Dementia
Figure 2 for A Multimodal Dangerous State Recognition and Early Warning System for Elderly with Intermittent Dementia
Figure 3 for A Multimodal Dangerous State Recognition and Early Warning System for Elderly with Intermittent Dementia
Figure 4 for A Multimodal Dangerous State Recognition and Early Warning System for Elderly with Intermittent Dementia
Viaarxiv icon

Can Language Models Solve Olympiad Programming?

Add code
Apr 16, 2024
Figure 1 for Can Language Models Solve Olympiad Programming?
Figure 2 for Can Language Models Solve Olympiad Programming?
Figure 3 for Can Language Models Solve Olympiad Programming?
Figure 4 for Can Language Models Solve Olympiad Programming?
Viaarxiv icon