Picture for Caiming Xiong

Caiming Xiong

Salesforce AI Research

Grounded Test-Time Adaptation for LLM Agents

Add code
Nov 06, 2025
Viaarxiv icon

Reasoning Curriculum: Bootstrapping Broad LLM Reasoning from Math

Add code
Oct 30, 2025
Viaarxiv icon

LiveResearchBench: A Live Benchmark for User-Centric Deep Research in the Wild

Add code
Oct 16, 2025
Viaarxiv icon

ToolLibGen: Scalable Automatic Tool Creation and Aggregation for LLM Reasoning

Add code
Oct 09, 2025
Figure 1 for ToolLibGen: Scalable Automatic Tool Creation and Aggregation for LLM Reasoning
Figure 2 for ToolLibGen: Scalable Automatic Tool Creation and Aggregation for LLM Reasoning
Figure 3 for ToolLibGen: Scalable Automatic Tool Creation and Aggregation for LLM Reasoning
Figure 4 for ToolLibGen: Scalable Automatic Tool Creation and Aggregation for LLM Reasoning
Viaarxiv icon

WALT: Web Agents that Learn Tools

Add code
Oct 01, 2025
Viaarxiv icon

GUI-KV: Efficient GUI Agents via KV Cache with Spatio-Temporal Awareness

Add code
Oct 01, 2025
Viaarxiv icon

SCUBA: Salesforce Computer Use Benchmark

Add code
Sep 30, 2025
Viaarxiv icon

LoCoBench: A Benchmark for Long-Context Large Language Models in Complex Software Engineering

Add code
Sep 11, 2025
Figure 1 for LoCoBench: A Benchmark for Long-Context Large Language Models in Complex Software Engineering
Figure 2 for LoCoBench: A Benchmark for Long-Context Large Language Models in Complex Software Engineering
Figure 3 for LoCoBench: A Benchmark for Long-Context Large Language Models in Complex Software Engineering
Figure 4 for LoCoBench: A Benchmark for Long-Context Large Language Models in Complex Software Engineering
Viaarxiv icon

Strefer: Empowering Video LLMs with Space-Time Referring and Reasoning via Synthetic Instruction Data

Add code
Sep 03, 2025
Figure 1 for Strefer: Empowering Video LLMs with Space-Time Referring and Reasoning via Synthetic Instruction Data
Figure 2 for Strefer: Empowering Video LLMs with Space-Time Referring and Reasoning via Synthetic Instruction Data
Figure 3 for Strefer: Empowering Video LLMs with Space-Time Referring and Reasoning via Synthetic Instruction Data
Figure 4 for Strefer: Empowering Video LLMs with Space-Time Referring and Reasoning via Synthetic Instruction Data
Viaarxiv icon

MCP-Universe: Benchmarking Large Language Models with Real-World Model Context Protocol Servers

Add code
Aug 20, 2025
Viaarxiv icon