Picture for Wei-Lin Chiang

Wei-Lin Chiang

Music Arena: Live Evaluation for Text-to-Music

Add code
Jul 28, 2025
Viaarxiv icon

Search Arena: Analyzing Search-Augmented LLMs

Add code
Jun 05, 2025
Viaarxiv icon

Prompt-to-Leaderboard

Add code
Feb 20, 2025
Figure 1 for Prompt-to-Leaderboard
Figure 2 for Prompt-to-Leaderboard
Figure 3 for Prompt-to-Leaderboard
Figure 4 for Prompt-to-Leaderboard
Viaarxiv icon

Exploring and Mitigating Adversarial Manipulation of Voting-Based Leaderboards

Add code
Jan 13, 2025
Figure 1 for Exploring and Mitigating Adversarial Manipulation of Voting-Based Leaderboards
Figure 2 for Exploring and Mitigating Adversarial Manipulation of Voting-Based Leaderboards
Figure 3 for Exploring and Mitigating Adversarial Manipulation of Voting-Based Leaderboards
Figure 4 for Exploring and Mitigating Adversarial Manipulation of Voting-Based Leaderboards
Viaarxiv icon

VisionArena: 230K Real World User-VLM Conversations with Preference Labels

Add code
Dec 11, 2024
Figure 1 for VisionArena: 230K Real World User-VLM Conversations with Preference Labels
Figure 2 for VisionArena: 230K Real World User-VLM Conversations with Preference Labels
Figure 3 for VisionArena: 230K Real World User-VLM Conversations with Preference Labels
Figure 4 for VisionArena: 230K Real World User-VLM Conversations with Preference Labels
Viaarxiv icon

SkyServe: Serving AI Models across Regions and Clouds with Spot Instances

Add code
Nov 03, 2024
Figure 1 for SkyServe: Serving AI Models across Regions and Clouds with Spot Instances
Figure 2 for SkyServe: Serving AI Models across Regions and Clouds with Spot Instances
Figure 3 for SkyServe: Serving AI Models across Regions and Clouds with Spot Instances
Figure 4 for SkyServe: Serving AI Models across Regions and Clouds with Spot Instances
Viaarxiv icon

How to Evaluate Reward Models for RLHF

Add code
Oct 18, 2024
Figure 1 for How to Evaluate Reward Models for RLHF
Figure 2 for How to Evaluate Reward Models for RLHF
Figure 3 for How to Evaluate Reward Models for RLHF
Figure 4 for How to Evaluate Reward Models for RLHF
Viaarxiv icon

RouteLLM: Learning to Route LLMs with Preference Data

Add code
Jun 26, 2024
Figure 1 for RouteLLM: Learning to Route LLMs with Preference Data
Figure 2 for RouteLLM: Learning to Route LLMs with Preference Data
Figure 3 for RouteLLM: Learning to Route LLMs with Preference Data
Figure 4 for RouteLLM: Learning to Route LLMs with Preference Data
Viaarxiv icon

From Crowdsourced Data to High-Quality Benchmarks: Arena-Hard and BenchBuilder Pipeline

Add code
Jun 17, 2024
Figure 1 for From Crowdsourced Data to High-Quality Benchmarks: Arena-Hard and BenchBuilder Pipeline
Figure 2 for From Crowdsourced Data to High-Quality Benchmarks: Arena-Hard and BenchBuilder Pipeline
Figure 3 for From Crowdsourced Data to High-Quality Benchmarks: Arena-Hard and BenchBuilder Pipeline
Figure 4 for From Crowdsourced Data to High-Quality Benchmarks: Arena-Hard and BenchBuilder Pipeline
Viaarxiv icon

OR-Bench: An Over-Refusal Benchmark for Large Language Models

Add code
May 31, 2024
Figure 1 for OR-Bench: An Over-Refusal Benchmark for Large Language Models
Figure 2 for OR-Bench: An Over-Refusal Benchmark for Large Language Models
Figure 3 for OR-Bench: An Over-Refusal Benchmark for Large Language Models
Figure 4 for OR-Bench: An Over-Refusal Benchmark for Large Language Models
Viaarxiv icon