Picture for Shibo Hong

Shibo Hong

CoDiQ: Test-Time Scaling for Controllable Difficult Question Generation

Add code
Feb 02, 2026
Viaarxiv icon

FRAbench and GenEval: Scaling Fine-Grained Aspect Evaluation across Tasks, Modalities

Add code
May 19, 2025
Viaarxiv icon

Two Minds Better Than One: Collaborative Reward Modeling for LLM Alignment

Add code
May 19, 2025
Viaarxiv icon

Toward Generalizable Evaluation in the LLM Era: A Survey Beyond Benchmarks

Add code
Apr 26, 2025
Figure 1 for Toward Generalizable Evaluation in the LLM Era: A Survey Beyond Benchmarks
Figure 2 for Toward Generalizable Evaluation in the LLM Era: A Survey Beyond Benchmarks
Figure 3 for Toward Generalizable Evaluation in the LLM Era: A Survey Beyond Benchmarks
Figure 4 for Toward Generalizable Evaluation in the LLM Era: A Survey Beyond Benchmarks
Viaarxiv icon