Picture for Yuntian Deng

Yuntian Deng

TokDrift: When LLM Speaks in Subwords but Code Speaks in Grammar

Add code
Oct 16, 2025
Viaarxiv icon

Interactive Training: Feedback-Driven Neural Network Optimization

Add code
Oct 02, 2025
Viaarxiv icon

From Chat Logs to Collective Insights: Aggregative Question Answering

Add code
May 29, 2025
Viaarxiv icon

Learn to Reason Efficiently with Adaptive Length-based Reward Shaping

Add code
May 21, 2025
Viaarxiv icon

The Leaderboard Illusion

Add code
Apr 29, 2025
Viaarxiv icon

WildVis: Open Source Visualizer for Million-Scale Chat Logs in the Wild

Add code
Sep 05, 2024
Viaarxiv icon

WildHallucinations: Evaluating Long-form Factuality in LLMs with Real-World Entity Queries

Add code
Jul 24, 2024
Figure 1 for WildHallucinations: Evaluating Long-form Factuality in LLMs with Real-World Entity Queries
Figure 2 for WildHallucinations: Evaluating Long-form Factuality in LLMs with Real-World Entity Queries
Figure 3 for WildHallucinations: Evaluating Long-form Factuality in LLMs with Real-World Entity Queries
Figure 4 for WildHallucinations: Evaluating Long-form Factuality in LLMs with Real-World Entity Queries
Viaarxiv icon

Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing

Add code
Jun 12, 2024
Viaarxiv icon

WildBench: Benchmarking LLMs with Challenging Tasks from Real Users in the Wild

Add code
Jun 07, 2024
Figure 1 for WildBench: Benchmarking LLMs with Challenging Tasks from Real Users in the Wild
Figure 2 for WildBench: Benchmarking LLMs with Challenging Tasks from Real Users in the Wild
Figure 3 for WildBench: Benchmarking LLMs with Challenging Tasks from Real Users in the Wild
Figure 4 for WildBench: Benchmarking LLMs with Challenging Tasks from Real Users in the Wild
Viaarxiv icon

MixEval: Deriving Wisdom of the Crowd from LLM Benchmark Mixtures

Add code
Jun 03, 2024
Figure 1 for MixEval: Deriving Wisdom of the Crowd from LLM Benchmark Mixtures
Figure 2 for MixEval: Deriving Wisdom of the Crowd from LLM Benchmark Mixtures
Figure 3 for MixEval: Deriving Wisdom of the Crowd from LLM Benchmark Mixtures
Figure 4 for MixEval: Deriving Wisdom of the Crowd from LLM Benchmark Mixtures
Viaarxiv icon