Picture for Bill Yuchen Lin

Bill Yuchen Lin

Shammie

Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing

Add code
Jun 12, 2024
Viaarxiv icon

The BiGGen Bench: A Principled Benchmark for Fine-grained Evaluation of Language Models with Language Models

Add code
Jun 09, 2024
Figure 1 for The BiGGen Bench: A Principled Benchmark for Fine-grained Evaluation of Language Models with Language Models
Figure 2 for The BiGGen Bench: A Principled Benchmark for Fine-grained Evaluation of Language Models with Language Models
Figure 3 for The BiGGen Bench: A Principled Benchmark for Fine-grained Evaluation of Language Models with Language Models
Figure 4 for The BiGGen Bench: A Principled Benchmark for Fine-grained Evaluation of Language Models with Language Models
Viaarxiv icon

WildBench: Benchmarking LLMs with Challenging Tasks from Real Users in the Wild

Add code
Jun 07, 2024
Figure 1 for WildBench: Benchmarking LLMs with Challenging Tasks from Real Users in the Wild
Figure 2 for WildBench: Benchmarking LLMs with Challenging Tasks from Real Users in the Wild
Figure 3 for WildBench: Benchmarking LLMs with Challenging Tasks from Real Users in the Wild
Figure 4 for WildBench: Benchmarking LLMs with Challenging Tasks from Real Users in the Wild
Viaarxiv icon

Prometheus 2: An Open Source Language Model Specialized in Evaluating Other Language Models

Add code
May 02, 2024
Figure 1 for Prometheus 2: An Open Source Language Model Specialized in Evaluating Other Language Models
Figure 2 for Prometheus 2: An Open Source Language Model Specialized in Evaluating Other Language Models
Figure 3 for Prometheus 2: An Open Source Language Model Specialized in Evaluating Other Language Models
Figure 4 for Prometheus 2: An Open Source Language Model Specialized in Evaluating Other Language Models
Viaarxiv icon

VisualWebBench: How Far Have Multimodal LLMs Evolved in Web Page Understanding and Grounding?

Add code
Apr 09, 2024
Viaarxiv icon

RewardBench: Evaluating Reward Models for Language Modeling

Add code
Mar 20, 2024
Viaarxiv icon

Trial and Error: Exploration-Based Trajectory Optimization for LLM Agents

Add code
Mar 04, 2024
Viaarxiv icon

OpenCodeInterpreter: Integrating Code Generation with Execution and Refinement

Add code
Feb 28, 2024
Figure 1 for OpenCodeInterpreter: Integrating Code Generation with Execution and Refinement
Figure 2 for OpenCodeInterpreter: Integrating Code Generation with Execution and Refinement
Figure 3 for OpenCodeInterpreter: Integrating Code Generation with Execution and Refinement
Figure 4 for OpenCodeInterpreter: Integrating Code Generation with Execution and Refinement
Viaarxiv icon

SafeDecoding: Defending against Jailbreak Attacks via Safety-Aware Decoding

Add code
Feb 24, 2024
Viaarxiv icon

Selective "Selective Prediction": Reducing Unnecessary Abstention in Vision-Language Reasoning

Add code
Feb 23, 2024
Figure 1 for Selective "Selective Prediction": Reducing Unnecessary Abstention in Vision-Language Reasoning
Figure 2 for Selective "Selective Prediction": Reducing Unnecessary Abstention in Vision-Language Reasoning
Figure 3 for Selective "Selective Prediction": Reducing Unnecessary Abstention in Vision-Language Reasoning
Figure 4 for Selective "Selective Prediction": Reducing Unnecessary Abstention in Vision-Language Reasoning
Viaarxiv icon