Picture for Tao Gui

Tao Gui

Memory in the Age of AI Agents

Add code
Dec 15, 2025
Viaarxiv icon

AgentPRM: Process Reward Models for LLM Agents via Step-Wise Promise and Progress

Add code
Nov 11, 2025
Viaarxiv icon

Counteracting Matthew Effect in Self-Improvement of LVLMs through Head-Tail Re-balancing

Add code
Oct 30, 2025
Viaarxiv icon

From Scores to Preferences: Redefining MOS Benchmarking for Speech Quality Reward Modeling

Add code
Oct 01, 2025
Viaarxiv icon

MDAR: A Multi-scene Dynamic Audio Reasoning Benchmark

Add code
Sep 26, 2025
Viaarxiv icon

AgentGym-RL: Training LLM Agents for Long-Horizon Decision Making through Multi-Turn Reinforcement Learning

Add code
Sep 10, 2025
Viaarxiv icon

LLMEval-3: A Large-Scale Longitudinal Study on Robust and Fair Evaluation of Large Language Models

Add code
Aug 07, 2025
Viaarxiv icon

Speech-Language Models with Decoupled Tokenizers and Multi-Token Prediction

Add code
Jun 14, 2025
Figure 1 for Speech-Language Models with Decoupled Tokenizers and Multi-Token Prediction
Figure 2 for Speech-Language Models with Decoupled Tokenizers and Multi-Token Prediction
Figure 3 for Speech-Language Models with Decoupled Tokenizers and Multi-Token Prediction
Figure 4 for Speech-Language Models with Decoupled Tokenizers and Multi-Token Prediction
Viaarxiv icon

LLMEval-Med: A Real-world Clinical Benchmark for Medical LLMs with Physician Validation

Add code
Jun 04, 2025
Figure 1 for LLMEval-Med: A Real-world Clinical Benchmark for Medical LLMs with Physician Validation
Figure 2 for LLMEval-Med: A Real-world Clinical Benchmark for Medical LLMs with Physician Validation
Figure 3 for LLMEval-Med: A Real-world Clinical Benchmark for Medical LLMs with Physician Validation
Figure 4 for LLMEval-Med: A Real-world Clinical Benchmark for Medical LLMs with Physician Validation
Viaarxiv icon

Compression Hacking: A Supplementary Perspective on Informatics Metric of Language Models from Geometric Distortion

Add code
May 23, 2025
Viaarxiv icon