Picture for Zuxin Liu

Zuxin Liu

LoCoBench-Agent: An Interactive Benchmark for LLM Agents in Long-Context Software Engineering

Add code
Nov 17, 2025
Figure 1 for LoCoBench-Agent: An Interactive Benchmark for LLM Agents in Long-Context Software Engineering
Figure 2 for LoCoBench-Agent: An Interactive Benchmark for LLM Agents in Long-Context Software Engineering
Figure 3 for LoCoBench-Agent: An Interactive Benchmark for LLM Agents in Long-Context Software Engineering
Figure 4 for LoCoBench-Agent: An Interactive Benchmark for LLM Agents in Long-Context Software Engineering
Viaarxiv icon

Grounded Test-Time Adaptation for LLM Agents

Add code
Nov 06, 2025
Figure 1 for Grounded Test-Time Adaptation for LLM Agents
Figure 2 for Grounded Test-Time Adaptation for LLM Agents
Figure 3 for Grounded Test-Time Adaptation for LLM Agents
Figure 4 for Grounded Test-Time Adaptation for LLM Agents
Viaarxiv icon

ToolLibGen: Scalable Automatic Tool Creation and Aggregation for LLM Reasoning

Add code
Oct 09, 2025
Figure 1 for ToolLibGen: Scalable Automatic Tool Creation and Aggregation for LLM Reasoning
Figure 2 for ToolLibGen: Scalable Automatic Tool Creation and Aggregation for LLM Reasoning
Figure 3 for ToolLibGen: Scalable Automatic Tool Creation and Aggregation for LLM Reasoning
Figure 4 for ToolLibGen: Scalable Automatic Tool Creation and Aggregation for LLM Reasoning
Viaarxiv icon

LoCoBench: A Benchmark for Long-Context Large Language Models in Complex Software Engineering

Add code
Sep 11, 2025
Figure 1 for LoCoBench: A Benchmark for Long-Context Large Language Models in Complex Software Engineering
Figure 2 for LoCoBench: A Benchmark for Long-Context Large Language Models in Complex Software Engineering
Figure 3 for LoCoBench: A Benchmark for Long-Context Large Language Models in Complex Software Engineering
Figure 4 for LoCoBench: A Benchmark for Long-Context Large Language Models in Complex Software Engineering
Viaarxiv icon

UserBench: An Interactive Gym Environment for User-Centric Agents

Add code
Jul 29, 2025
Figure 1 for UserBench: An Interactive Gym Environment for User-Centric Agents
Figure 2 for UserBench: An Interactive Gym Environment for User-Centric Agents
Figure 3 for UserBench: An Interactive Gym Environment for User-Centric Agents
Figure 4 for UserBench: An Interactive Gym Environment for User-Centric Agents
Viaarxiv icon

MoDoMoDo: Multi-Domain Data Mixtures for Multimodal LLM Reinforcement Learning

Add code
May 30, 2025
Viaarxiv icon

Behavior Injection: Preparing Language Models for Reinforcement Learning

Add code
May 25, 2025
Viaarxiv icon

APIGen-MT: Agentic Pipeline for Multi-Turn Data Generation via Simulated Agent-Human Interplay

Add code
Apr 08, 2025
Viaarxiv icon

ActionStudio: A Lightweight Framework for Data and Training of Large Action Models

Add code
Mar 31, 2025
Viaarxiv icon

PersonaBench: Evaluating AI Models on Understanding Personal Information through Accessing (Synthetic) Private User Data

Add code
Feb 28, 2025
Viaarxiv icon