Picture for Yuntian Deng

Yuntian Deng

From Chat Logs to Collective Insights: Aggregative Question Answering

Add code
May 29, 2025
Viaarxiv icon

Learn to Reason Efficiently with Adaptive Length-based Reward Shaping

Add code
May 21, 2025
Viaarxiv icon

The Leaderboard Illusion

Add code
Apr 29, 2025
Viaarxiv icon

WildVis: Open Source Visualizer for Million-Scale Chat Logs in the Wild

Add code
Sep 05, 2024
Viaarxiv icon

WildHallucinations: Evaluating Long-form Factuality in LLMs with Real-World Entity Queries

Add code
Jul 24, 2024
Figure 1 for WildHallucinations: Evaluating Long-form Factuality in LLMs with Real-World Entity Queries
Figure 2 for WildHallucinations: Evaluating Long-form Factuality in LLMs with Real-World Entity Queries
Figure 3 for WildHallucinations: Evaluating Long-form Factuality in LLMs with Real-World Entity Queries
Figure 4 for WildHallucinations: Evaluating Long-form Factuality in LLMs with Real-World Entity Queries
Viaarxiv icon

Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing

Add code
Jun 12, 2024
Viaarxiv icon

WildBench: Benchmarking LLMs with Challenging Tasks from Real Users in the Wild

Add code
Jun 07, 2024
Figure 1 for WildBench: Benchmarking LLMs with Challenging Tasks from Real Users in the Wild
Figure 2 for WildBench: Benchmarking LLMs with Challenging Tasks from Real Users in the Wild
Figure 3 for WildBench: Benchmarking LLMs with Challenging Tasks from Real Users in the Wild
Figure 4 for WildBench: Benchmarking LLMs with Challenging Tasks from Real Users in the Wild
Viaarxiv icon

MixEval: Deriving Wisdom of the Crowd from LLM Benchmark Mixtures

Add code
Jun 03, 2024
Figure 1 for MixEval: Deriving Wisdom of the Crowd from LLM Benchmark Mixtures
Figure 2 for MixEval: Deriving Wisdom of the Crowd from LLM Benchmark Mixtures
Figure 3 for MixEval: Deriving Wisdom of the Crowd from LLM Benchmark Mixtures
Figure 4 for MixEval: Deriving Wisdom of the Crowd from LLM Benchmark Mixtures
Viaarxiv icon

From Explicit CoT to Implicit CoT: Learning to Internalize CoT Step by Step

Add code
May 23, 2024
Viaarxiv icon

WildChat: 1M ChatGPT Interaction Logs in the Wild

Add code
May 02, 2024
Viaarxiv icon