Picture for Chenhao Tan

Chenhao Tan

Personalized Benchmarking: Evaluating LLMs by Individual Preferences

Add code
Apr 21, 2026
Viaarxiv icon

Automatically Generating Hard Math Problems from Hypothesis-Driven Error Analysis

Add code
Apr 06, 2026
Viaarxiv icon

Executable Counterfactuals: Improving LLMs' Causal Reasoning Through Code

Add code
Oct 02, 2025
Viaarxiv icon

From Feedback to Checklists: Grounded Evaluation of AI-Generated Clinical Notes

Add code
Jul 23, 2025
Figure 1 for From Feedback to Checklists: Grounded Evaluation of AI-Generated Clinical Notes
Figure 2 for From Feedback to Checklists: Grounded Evaluation of AI-Generated Clinical Notes
Figure 3 for From Feedback to Checklists: Grounded Evaluation of AI-Generated Clinical Notes
Figure 4 for From Feedback to Checklists: Grounded Evaluation of AI-Generated Clinical Notes
Viaarxiv icon

AbsenceBench: Language Models Can't Tell What's Missing

Add code
Jun 13, 2025
Figure 1 for AbsenceBench: Language Models Can't Tell What's Missing
Figure 2 for AbsenceBench: Language Models Can't Tell What's Missing
Figure 3 for AbsenceBench: Language Models Can't Tell What's Missing
Figure 4 for AbsenceBench: Language Models Can't Tell What's Missing
Viaarxiv icon

The Curious Language Model: Strategic Test-Time Information Acquisition

Add code
Jun 10, 2025
Figure 1 for The Curious Language Model: Strategic Test-Time Information Acquisition
Figure 2 for The Curious Language Model: Strategic Test-Time Information Acquisition
Figure 3 for The Curious Language Model: Strategic Test-Time Information Acquisition
Figure 4 for The Curious Language Model: Strategic Test-Time Information Acquisition
Viaarxiv icon

CLEAR: A Clinically-Grounded Tabular Framework for Radiology Report Evaluation

Add code
May 22, 2025
Figure 1 for CLEAR: A Clinically-Grounded Tabular Framework for Radiology Report Evaluation
Figure 2 for CLEAR: A Clinically-Grounded Tabular Framework for Radiology Report Evaluation
Figure 3 for CLEAR: A Clinically-Grounded Tabular Framework for Radiology Report Evaluation
Figure 4 for CLEAR: A Clinically-Grounded Tabular Framework for Radiology Report Evaluation
Viaarxiv icon

Concept Incongruence: An Exploration of Time and Death in Role Playing

Add code
May 20, 2025
Viaarxiv icon

HyPerAlign: Hypotheses-driven Personalized Alignment

Add code
Apr 29, 2025
Figure 1 for HyPerAlign: Hypotheses-driven Personalized Alignment
Figure 2 for HyPerAlign: Hypotheses-driven Personalized Alignment
Figure 3 for HyPerAlign: Hypotheses-driven Personalized Alignment
Figure 4 for HyPerAlign: Hypotheses-driven Personalized Alignment
Viaarxiv icon

HypoBench: Towards Systematic and Principled Benchmarking for Hypothesis Generation

Add code
Apr 15, 2025
Viaarxiv icon