Picture for Moxin Li

Moxin Li

MTR-Bench: A Comprehensive Benchmark for Multi-Turn Reasoning Evaluation

Add code
May 26, 2025
Viaarxiv icon

Assistant-Guided Mitigation of Teacher Preference Bias in LLM-as-a-Judge

Add code
May 25, 2025
Viaarxiv icon

Self-Improvement Towards Pareto Optimality: Mitigating Preference Conflicts in Multi-Objective Alignment

Add code
Feb 20, 2025
Viaarxiv icon

HellaSwag-Pro: A Large-Scale Bilingual Benchmark for Evaluating the Robustness of LLMs in Commonsense Reasoning

Add code
Feb 17, 2025
Viaarxiv icon

Knowledge Boundary of Large Language Models: A Survey

Add code
Dec 17, 2024
Figure 1 for Knowledge Boundary of Large Language Models: A Survey
Figure 2 for Knowledge Boundary of Large Language Models: A Survey
Figure 3 for Knowledge Boundary of Large Language Models: A Survey
Figure 4 for Knowledge Boundary of Large Language Models: A Survey
Viaarxiv icon

Dual-Phase Accelerated Prompt Optimization

Add code
Jun 19, 2024
Figure 1 for Dual-Phase Accelerated Prompt Optimization
Figure 2 for Dual-Phase Accelerated Prompt Optimization
Figure 3 for Dual-Phase Accelerated Prompt Optimization
Figure 4 for Dual-Phase Accelerated Prompt Optimization
Viaarxiv icon

Counterfactual Debating with Preset Stances for Hallucination Elimination of LLMs

Add code
Jun 17, 2024
Figure 1 for Counterfactual Debating with Preset Stances for Hallucination Elimination of LLMs
Figure 2 for Counterfactual Debating with Preset Stances for Hallucination Elimination of LLMs
Figure 3 for Counterfactual Debating with Preset Stances for Hallucination Elimination of LLMs
Figure 4 for Counterfactual Debating with Preset Stances for Hallucination Elimination of LLMs
Viaarxiv icon

Evaluating Mathematical Reasoning of Large Language Models: A Focus on Error Identification and Correction

Add code
Jun 02, 2024
Viaarxiv icon

Think Twice Before Assure: Confidence Estimation for Large Language Models through Reflection on Multiple Answers

Add code
Mar 15, 2024
Viaarxiv icon

Gotcha! Don't trick me with unanswerable questions! Self-aligning Large Language Models for Responding to Unknown Questions

Add code
Feb 23, 2024
Viaarxiv icon