Picture for Richeng Xuan

Richeng Xuan

FlagEvalMM: A Flexible Framework for Comprehensive Multimodal Model Evaluation

Add code
Jun 10, 2025
Viaarxiv icon

Memorization or Reasoning? Exploring the Idiom Understanding of LLMs

Add code
May 22, 2025
Viaarxiv icon

HalluDial: A Large-Scale Benchmark for Automatic Dialogue-Level Hallucination Evaluation

Add code
Jun 11, 2024
Figure 1 for HalluDial: A Large-Scale Benchmark for Automatic Dialogue-Level Hallucination Evaluation
Figure 2 for HalluDial: A Large-Scale Benchmark for Automatic Dialogue-Level Hallucination Evaluation
Figure 3 for HalluDial: A Large-Scale Benchmark for Automatic Dialogue-Level Hallucination Evaluation
Figure 4 for HalluDial: A Large-Scale Benchmark for Automatic Dialogue-Level Hallucination Evaluation
Viaarxiv icon

Bi-DCSpell: A Bi-directional Detector-Corrector Interactive Framework for Chinese Spelling Check

Add code
Jun 04, 2024
Viaarxiv icon

CMMU: A Benchmark for Chinese Multi-modal Multi-type Question Understanding and Reasoning

Add code
Jan 26, 2024
Figure 1 for CMMU: A Benchmark for Chinese Multi-modal Multi-type Question Understanding and Reasoning
Figure 2 for CMMU: A Benchmark for Chinese Multi-modal Multi-type Question Understanding and Reasoning
Figure 3 for CMMU: A Benchmark for Chinese Multi-modal Multi-type Question Understanding and Reasoning
Figure 4 for CMMU: A Benchmark for Chinese Multi-modal Multi-type Question Understanding and Reasoning
Viaarxiv icon