Picture for Benfeng Xu

Benfeng Xu

DeepResearch Bench: A Comprehensive Benchmark for Deep Research Agents

Add code
Jun 13, 2025
Viaarxiv icon

From Real to Synthetic: Synthesizing Millions of Diversified and Complicated User Instructions with Attributed Grounding

Add code
Jun 04, 2025
Viaarxiv icon

Rationales Are Not Silver Bullets: Measuring the Impact of Rationales on Model Performance and Reliability

Add code
May 30, 2025
Viaarxiv icon

MIRROR: Multi-agent Intra- and Inter-Reflection for Optimized Reasoning in Tool Learning

Add code
May 27, 2025
Viaarxiv icon

Training LLM-Based Agents with Synthetic Self-Reflected Trajectories and Partial Masking

Add code
May 26, 2025
Viaarxiv icon

Automated Creativity Evaluation for Large Language Models: A Reference-Based Approach

Add code
Apr 22, 2025
Figure 1 for Automated Creativity Evaluation for Large Language Models: A Reference-Based Approach
Figure 2 for Automated Creativity Evaluation for Large Language Models: A Reference-Based Approach
Figure 3 for Automated Creativity Evaluation for Large Language Models: A Reference-Based Approach
Figure 4 for Automated Creativity Evaluation for Large Language Models: A Reference-Based Approach
Viaarxiv icon

Benchmarking Large Language Models on Controllable Generation under Diversified Instructions

Add code
Jan 01, 2024
Figure 1 for Benchmarking Large Language Models on Controllable Generation under Diversified Instructions
Figure 2 for Benchmarking Large Language Models on Controllable Generation under Diversified Instructions
Figure 3 for Benchmarking Large Language Models on Controllable Generation under Diversified Instructions
Figure 4 for Benchmarking Large Language Models on Controllable Generation under Diversified Instructions
Viaarxiv icon

On the Calibration of Large Language Models and Alignment

Add code
Nov 22, 2023
Viaarxiv icon

Self-Evolved Diverse Data Sampling for Efficient Instruction Tuning

Add code
Nov 14, 2023
Viaarxiv icon

Qwen Technical Report

Add code
Sep 28, 2023
Figure 1 for Qwen Technical Report
Figure 2 for Qwen Technical Report
Figure 3 for Qwen Technical Report
Figure 4 for Qwen Technical Report
Viaarxiv icon