Picture for Tengyu Xu

Tengyu Xu

Boosting LLM Reasoning via Spontaneous Self-Correction

Add code
Jun 07, 2025
Viaarxiv icon

LlamaRL: A Distributed Asynchronous Reinforcement Learning Framework for Efficient Large-scale LLM Trainin

Add code
May 29, 2025
Viaarxiv icon

Reinforcement Learning from User Feedback

Add code
May 20, 2025
Viaarxiv icon

Learning Auxiliary Tasks Improves Reference-Free Hallucination Detection in Open-Domain Long-Form Generation

Add code
May 18, 2025
Viaarxiv icon

Think Smarter not Harder: Adaptive Reasoning with Inference Aware Optimization

Add code
Jan 31, 2025
Viaarxiv icon

HyperZero: A Customized End-to-End Auto-Tuning System for Recommendation with Hourly Feedback

Add code
Jan 30, 2025
Figure 1 for HyperZero: A Customized End-to-End Auto-Tuning System for Recommendation with Hourly Feedback
Figure 2 for HyperZero: A Customized End-to-End Auto-Tuning System for Recommendation with Hourly Feedback
Figure 3 for HyperZero: A Customized End-to-End Auto-Tuning System for Recommendation with Hourly Feedback
Figure 4 for HyperZero: A Customized End-to-End Auto-Tuning System for Recommendation with Hourly Feedback
Viaarxiv icon

Step-KTO: Optimizing Mathematical Reasoning through Stepwise Binary Feedback

Add code
Jan 18, 2025
Viaarxiv icon

Multi-IF: Benchmarking LLMs on Multi-Turn and Multilingual Instructions Following

Add code
Oct 21, 2024
Viaarxiv icon

The Perfect Blend: Redefining RLHF with Mixture of Judges

Add code
Sep 30, 2024
Figure 1 for The Perfect Blend: Redefining RLHF with Mixture of Judges
Figure 2 for The Perfect Blend: Redefining RLHF with Mixture of Judges
Figure 3 for The Perfect Blend: Redefining RLHF with Mixture of Judges
Figure 4 for The Perfect Blend: Redefining RLHF with Mixture of Judges
Viaarxiv icon

Provably Efficient Offline Reinforcement Learning with Trajectory-Wise Reward

Add code
Jun 13, 2022
Viaarxiv icon