Picture for Tianle Li

Tianle Li

From Crowdsourced Data to High-Quality Benchmarks: Arena-Hard and BenchBuilder Pipeline

Add code
Jun 17, 2024
Viaarxiv icon

GenAI Arena: An Open Evaluation Platform for Generative Models

Add code
Jun 06, 2024
Viaarxiv icon

MMLU-Pro: A More Robust and Challenging Multi-Task Language Understanding Benchmark

Add code
Jun 04, 2024
Figure 1 for MMLU-Pro: A More Robust and Challenging Multi-Task Language Understanding Benchmark
Figure 2 for MMLU-Pro: A More Robust and Challenging Multi-Task Language Understanding Benchmark
Figure 3 for MMLU-Pro: A More Robust and Challenging Multi-Task Language Understanding Benchmark
Figure 4 for MMLU-Pro: A More Robust and Challenging Multi-Task Language Understanding Benchmark
Viaarxiv icon

Long-context LLMs Struggle with Long In-context Learning

Add code
Apr 04, 2024
Viaarxiv icon

Chatbot Arena: An Open Platform for Evaluating LLMs by Human Preference

Add code
Mar 07, 2024
Figure 1 for Chatbot Arena: An Open Platform for Evaluating LLMs by Human Preference
Figure 2 for Chatbot Arena: An Open Platform for Evaluating LLMs by Human Preference
Figure 3 for Chatbot Arena: An Open Platform for Evaluating LLMs by Human Preference
Figure 4 for Chatbot Arena: An Open Platform for Evaluating LLMs by Human Preference
Viaarxiv icon

SWAG: Storytelling With Action Guidance

Add code
Feb 05, 2024
Viaarxiv icon

ImagenHub: Standardizing the evaluation of conditional image generation models

Add code
Oct 17, 2023
Figure 1 for ImagenHub: Standardizing the evaluation of conditional image generation models
Figure 2 for ImagenHub: Standardizing the evaluation of conditional image generation models
Figure 3 for ImagenHub: Standardizing the evaluation of conditional image generation models
Figure 4 for ImagenHub: Standardizing the evaluation of conditional image generation models
Viaarxiv icon

LMSYS-Chat-1M: A Large-Scale Real-World LLM Conversation Dataset

Add code
Sep 30, 2023
Figure 1 for LMSYS-Chat-1M: A Large-Scale Real-World LLM Conversation Dataset
Figure 2 for LMSYS-Chat-1M: A Large-Scale Real-World LLM Conversation Dataset
Figure 3 for LMSYS-Chat-1M: A Large-Scale Real-World LLM Conversation Dataset
Figure 4 for LMSYS-Chat-1M: A Large-Scale Real-World LLM Conversation Dataset
Viaarxiv icon

DreamEdit: Subject-driven Image Editing

Add code
Jun 22, 2023
Figure 1 for DreamEdit: Subject-driven Image Editing
Figure 2 for DreamEdit: Subject-driven Image Editing
Figure 3 for DreamEdit: Subject-driven Image Editing
Figure 4 for DreamEdit: Subject-driven Image Editing
Viaarxiv icon

Few-shot In-context Learning for Knowledge Base Question Answering

Add code
May 04, 2023
Figure 1 for Few-shot In-context Learning for Knowledge Base Question Answering
Figure 2 for Few-shot In-context Learning for Knowledge Base Question Answering
Figure 3 for Few-shot In-context Learning for Knowledge Base Question Answering
Figure 4 for Few-shot In-context Learning for Knowledge Base Question Answering
Viaarxiv icon