Picture for Richeng Xuan

Richeng Xuan

Trust the uncertain teacher: distilling dark knowledge via calibrated uncertainty

Add code
Feb 13, 2026
Viaarxiv icon

QuantEval: A Benchmark for Financial Quantitative Tasks in Large Language Models

Add code
Jan 13, 2026
Viaarxiv icon

LaoBench: A Large-Scale Multidimensional Lao Benchmark for Large Language Models

Add code
Nov 14, 2025
Viaarxiv icon

FlagEvalMM: A Flexible Framework for Comprehensive Multimodal Model Evaluation

Add code
Jun 10, 2025
Figure 1 for FlagEvalMM: A Flexible Framework for Comprehensive Multimodal Model Evaluation
Figure 2 for FlagEvalMM: A Flexible Framework for Comprehensive Multimodal Model Evaluation
Figure 3 for FlagEvalMM: A Flexible Framework for Comprehensive Multimodal Model Evaluation
Figure 4 for FlagEvalMM: A Flexible Framework for Comprehensive Multimodal Model Evaluation
Viaarxiv icon

Memorization or Reasoning? Exploring the Idiom Understanding of LLMs

Add code
May 22, 2025
Figure 1 for Memorization or Reasoning? Exploring the Idiom Understanding of LLMs
Figure 2 for Memorization or Reasoning? Exploring the Idiom Understanding of LLMs
Figure 3 for Memorization or Reasoning? Exploring the Idiom Understanding of LLMs
Figure 4 for Memorization or Reasoning? Exploring the Idiom Understanding of LLMs
Viaarxiv icon

HalluDial: A Large-Scale Benchmark for Automatic Dialogue-Level Hallucination Evaluation

Add code
Jun 11, 2024
Figure 1 for HalluDial: A Large-Scale Benchmark for Automatic Dialogue-Level Hallucination Evaluation
Figure 2 for HalluDial: A Large-Scale Benchmark for Automatic Dialogue-Level Hallucination Evaluation
Figure 3 for HalluDial: A Large-Scale Benchmark for Automatic Dialogue-Level Hallucination Evaluation
Figure 4 for HalluDial: A Large-Scale Benchmark for Automatic Dialogue-Level Hallucination Evaluation
Viaarxiv icon

Bi-DCSpell: A Bi-directional Detector-Corrector Interactive Framework for Chinese Spelling Check

Add code
Jun 04, 2024
Figure 1 for Bi-DCSpell: A Bi-directional Detector-Corrector Interactive Framework for Chinese Spelling Check
Figure 2 for Bi-DCSpell: A Bi-directional Detector-Corrector Interactive Framework for Chinese Spelling Check
Figure 3 for Bi-DCSpell: A Bi-directional Detector-Corrector Interactive Framework for Chinese Spelling Check
Figure 4 for Bi-DCSpell: A Bi-directional Detector-Corrector Interactive Framework for Chinese Spelling Check
Viaarxiv icon

CMMU: A Benchmark for Chinese Multi-modal Multi-type Question Understanding and Reasoning

Add code
Jan 26, 2024
Figure 1 for CMMU: A Benchmark for Chinese Multi-modal Multi-type Question Understanding and Reasoning
Figure 2 for CMMU: A Benchmark for Chinese Multi-modal Multi-type Question Understanding and Reasoning
Figure 3 for CMMU: A Benchmark for Chinese Multi-modal Multi-type Question Understanding and Reasoning
Figure 4 for CMMU: A Benchmark for Chinese Multi-modal Multi-type Question Understanding and Reasoning
Viaarxiv icon