Picture for Felix Bai

Felix Bai

COMPASS: A Multi-Turn Benchmark for Tool-Mediated Planning & Preference Optimization

Add code
Oct 08, 2025
Figure 1 for COMPASS: A Multi-Turn Benchmark for Tool-Mediated Planning & Preference Optimization
Figure 2 for COMPASS: A Multi-Turn Benchmark for Tool-Mediated Planning & Preference Optimization
Figure 3 for COMPASS: A Multi-Turn Benchmark for Tool-Mediated Planning & Preference Optimization
Figure 4 for COMPASS: A Multi-Turn Benchmark for Tool-Mediated Planning & Preference Optimization
Viaarxiv icon

MR. Judge: Multimodal Reasoner as a Judge

Add code
May 19, 2025
Viaarxiv icon

ToolSandbox: A Stateful, Conversational, Interactive Evaluation Benchmark for LLM Tool Use Capabilities

Add code
Aug 08, 2024
Figure 1 for ToolSandbox: A Stateful, Conversational, Interactive Evaluation Benchmark for LLM Tool Use Capabilities
Figure 2 for ToolSandbox: A Stateful, Conversational, Interactive Evaluation Benchmark for LLM Tool Use Capabilities
Figure 3 for ToolSandbox: A Stateful, Conversational, Interactive Evaluation Benchmark for LLM Tool Use Capabilities
Figure 4 for ToolSandbox: A Stateful, Conversational, Interactive Evaluation Benchmark for LLM Tool Use Capabilities
Viaarxiv icon

Apple Intelligence Foundation Language Models

Add code
Jul 29, 2024
Figure 1 for Apple Intelligence Foundation Language Models
Figure 2 for Apple Intelligence Foundation Language Models
Figure 3 for Apple Intelligence Foundation Language Models
Figure 4 for Apple Intelligence Foundation Language Models
Viaarxiv icon