Picture for Xuanyu Lei

Xuanyu Lei

A Cause-Effect Look at Alleviating Hallucination of Knowledge-grounded Dialogue Generation

Add code
Apr 04, 2024
Viaarxiv icon

Scaffolding Coordinates to Promote Vision-Language Coordination in Large Multi-Modal Models

Add code
Feb 19, 2024
Viaarxiv icon

AlignBench: Benchmarking Chinese Alignment of Large Language Models

Add code
Dec 05, 2023
Figure 1 for AlignBench: Benchmarking Chinese Alignment of Large Language Models
Figure 2 for AlignBench: Benchmarking Chinese Alignment of Large Language Models
Figure 3 for AlignBench: Benchmarking Chinese Alignment of Large Language Models
Figure 4 for AlignBench: Benchmarking Chinese Alignment of Large Language Models
Viaarxiv icon

CritiqueLLM: Scaling LLM-as-Critic for Effective and Explainable Evaluation of Large Language Model Generation

Add code
Nov 30, 2023
Figure 1 for CritiqueLLM: Scaling LLM-as-Critic for Effective and Explainable Evaluation of Large Language Model Generation
Figure 2 for CritiqueLLM: Scaling LLM-as-Critic for Effective and Explainable Evaluation of Large Language Model Generation
Figure 3 for CritiqueLLM: Scaling LLM-as-Critic for Effective and Explainable Evaluation of Large Language Model Generation
Figure 4 for CritiqueLLM: Scaling LLM-as-Critic for Effective and Explainable Evaluation of Large Language Model Generation
Viaarxiv icon

SafetyBench: Evaluating the Safety of Large Language Models with Multiple Choice Questions

Add code
Sep 13, 2023
Figure 1 for SafetyBench: Evaluating the Safety of Large Language Models with Multiple Choice Questions
Figure 2 for SafetyBench: Evaluating the Safety of Large Language Models with Multiple Choice Questions
Figure 3 for SafetyBench: Evaluating the Safety of Large Language Models with Multiple Choice Questions
Figure 4 for SafetyBench: Evaluating the Safety of Large Language Models with Multiple Choice Questions
Viaarxiv icon

AgentBench: Evaluating LLMs as Agents

Add code
Aug 07, 2023
Figure 1 for AgentBench: Evaluating LLMs as Agents
Figure 2 for AgentBench: Evaluating LLMs as Agents
Figure 3 for AgentBench: Evaluating LLMs as Agents
Figure 4 for AgentBench: Evaluating LLMs as Agents
Viaarxiv icon