Picture for Chenhao Tan

Chenhao Tan

Executable Counterfactuals: Improving LLMs' Causal Reasoning Through Code

Add code
Oct 02, 2025
Viaarxiv icon

From Feedback to Checklists: Grounded Evaluation of AI-Generated Clinical Notes

Add code
Jul 23, 2025
Figure 1 for From Feedback to Checklists: Grounded Evaluation of AI-Generated Clinical Notes
Figure 2 for From Feedback to Checklists: Grounded Evaluation of AI-Generated Clinical Notes
Figure 3 for From Feedback to Checklists: Grounded Evaluation of AI-Generated Clinical Notes
Figure 4 for From Feedback to Checklists: Grounded Evaluation of AI-Generated Clinical Notes
Viaarxiv icon

AbsenceBench: Language Models Can't Tell What's Missing

Add code
Jun 13, 2025
Figure 1 for AbsenceBench: Language Models Can't Tell What's Missing
Figure 2 for AbsenceBench: Language Models Can't Tell What's Missing
Figure 3 for AbsenceBench: Language Models Can't Tell What's Missing
Figure 4 for AbsenceBench: Language Models Can't Tell What's Missing
Viaarxiv icon

The Curious Language Model: Strategic Test-Time Information Acquisition

Add code
Jun 10, 2025
Figure 1 for The Curious Language Model: Strategic Test-Time Information Acquisition
Figure 2 for The Curious Language Model: Strategic Test-Time Information Acquisition
Figure 3 for The Curious Language Model: Strategic Test-Time Information Acquisition
Figure 4 for The Curious Language Model: Strategic Test-Time Information Acquisition
Viaarxiv icon

CLEAR: A Clinically-Grounded Tabular Framework for Radiology Report Evaluation

Add code
May 22, 2025
Figure 1 for CLEAR: A Clinically-Grounded Tabular Framework for Radiology Report Evaluation
Figure 2 for CLEAR: A Clinically-Grounded Tabular Framework for Radiology Report Evaluation
Figure 3 for CLEAR: A Clinically-Grounded Tabular Framework for Radiology Report Evaluation
Figure 4 for CLEAR: A Clinically-Grounded Tabular Framework for Radiology Report Evaluation
Viaarxiv icon

Concept Incongruence: An Exploration of Time and Death in Role Playing

Add code
May 20, 2025
Viaarxiv icon

HyPerAlign: Hypotheses-driven Personalized Alignment

Add code
Apr 29, 2025
Viaarxiv icon

HypoBench: Towards Systematic and Principled Benchmarking for Hypothesis Generation

Add code
Apr 15, 2025
Viaarxiv icon

HypoEval: Hypothesis-Guided Evaluation for Natural Language Generation

Add code
Apr 09, 2025
Viaarxiv icon

On the Effectiveness and Generalization of Race Representations for Debiasing High-Stakes Decisions

Add code
Apr 07, 2025
Viaarxiv icon