Picture for Muyun Yang

Muyun Yang

Long-form RewardBench: Evaluating Reward Models for Long-form Generation

Add code
Mar 13, 2026
Viaarxiv icon

Toward Robust LLM-Based Judges: Taxonomic Bias Evaluation and Debiasing Optimization

Add code
Mar 09, 2026
Viaarxiv icon

Beyond Token-Level Policy Gradients for Complex Reasoning with Large Language Models

Add code
Feb 16, 2026
Viaarxiv icon

Thinking with Comics: Enhancing Multimodal Reasoning through Structured Visual Storytelling

Add code
Feb 03, 2026
Viaarxiv icon

RM-Distiller: Exploiting Generative LLM for Reward Model Distillation

Add code
Jan 20, 2026
Viaarxiv icon

Reasoning Model Is Superior LLM-Judge, Yet Suffers from Biases

Add code
Jan 07, 2026
Viaarxiv icon

DiVA: Fine-grained Factuality Verification with Agentic-Discriminative Verifier

Add code
Jan 07, 2026
Viaarxiv icon

Lost in Benchmarks? Rethinking Large Language Model Benchmarking with Item Response Theory

Add code
May 21, 2025
Figure 1 for Lost in Benchmarks? Rethinking Large Language Model Benchmarking with Item Response Theory
Figure 2 for Lost in Benchmarks? Rethinking Large Language Model Benchmarking with Item Response Theory
Figure 3 for Lost in Benchmarks? Rethinking Large Language Model Benchmarking with Item Response Theory
Figure 4 for Lost in Benchmarks? Rethinking Large Language Model Benchmarking with Item Response Theory
Viaarxiv icon

Memory-augmented Query Reconstruction for LLM-based Knowledge Graph Reasoning

Add code
Mar 07, 2025
Figure 1 for Memory-augmented Query Reconstruction for LLM-based Knowledge Graph Reasoning
Figure 2 for Memory-augmented Query Reconstruction for LLM-based Knowledge Graph Reasoning
Figure 3 for Memory-augmented Query Reconstruction for LLM-based Knowledge Graph Reasoning
Figure 4 for Memory-augmented Query Reconstruction for LLM-based Knowledge Graph Reasoning
Viaarxiv icon

Evaluating o1-Like LLMs: Unlocking Reasoning for Translation through Comprehensive Analysis

Add code
Feb 17, 2025
Figure 1 for Evaluating o1-Like LLMs: Unlocking Reasoning for Translation through Comprehensive Analysis
Figure 2 for Evaluating o1-Like LLMs: Unlocking Reasoning for Translation through Comprehensive Analysis
Figure 3 for Evaluating o1-Like LLMs: Unlocking Reasoning for Translation through Comprehensive Analysis
Figure 4 for Evaluating o1-Like LLMs: Unlocking Reasoning for Translation through Comprehensive Analysis
Viaarxiv icon