Picture for Linhao Yu

Linhao Yu

CMoralEval: A Moral Evaluation Benchmark for Chinese Large Language Models

Add code
Aug 19, 2024
Viaarxiv icon

LFED: A Literary Fiction Evaluation Dataset for Large Language Models

Add code
May 16, 2024
Viaarxiv icon

OpenEval: Benchmarking Chinese LLMs across Capability, Alignment and Safety

Add code
Mar 18, 2024
Figure 1 for OpenEval: Benchmarking Chinese LLMs across Capability, Alignment and Safety
Figure 2 for OpenEval: Benchmarking Chinese LLMs across Capability, Alignment and Safety
Figure 3 for OpenEval: Benchmarking Chinese LLMs across Capability, Alignment and Safety
Figure 4 for OpenEval: Benchmarking Chinese LLMs across Capability, Alignment and Safety
Viaarxiv icon

Identifying Multiple Personalities in Large Language Models with External Evaluation

Add code
Feb 22, 2024
Viaarxiv icon

Evaluating Large Language Models: A Comprehensive Survey

Add code
Oct 31, 2023
Figure 1 for Evaluating Large Language Models: A Comprehensive Survey
Figure 2 for Evaluating Large Language Models: A Comprehensive Survey
Figure 3 for Evaluating Large Language Models: A Comprehensive Survey
Figure 4 for Evaluating Large Language Models: A Comprehensive Survey
Viaarxiv icon

M3KE: A Massive Multi-Level Multi-Subject Knowledge Evaluation Benchmark for Chinese Large Language Models

Add code
May 21, 2023
Figure 1 for M3KE: A Massive Multi-Level Multi-Subject Knowledge Evaluation Benchmark for Chinese Large Language Models
Figure 2 for M3KE: A Massive Multi-Level Multi-Subject Knowledge Evaluation Benchmark for Chinese Large Language Models
Figure 3 for M3KE: A Massive Multi-Level Multi-Subject Knowledge Evaluation Benchmark for Chinese Large Language Models
Figure 4 for M3KE: A Massive Multi-Level Multi-Subject Knowledge Evaluation Benchmark for Chinese Large Language Models
Viaarxiv icon