Picture for Shan Chen

Shan Chen

KScope: A Framework for Characterizing the Knowledge Status of Language Models

Add code
Jun 09, 2025
Viaarxiv icon

When Models Reason in Your Language: Controlling Thinking Trace Language Comes at the Cost of Accuracy

Add code
May 28, 2025
Viaarxiv icon

MedBrowseComp: Benchmarking Medical Deep Research and Computer Use

Add code
May 20, 2025
Viaarxiv icon

Sparse Autoencoder Features for Classifications and Transferability

Add code
Feb 17, 2025
Viaarxiv icon

The use of large language models to enhance cancer clinical trial educational materials

Add code
Dec 02, 2024
Viaarxiv icon

ClinicalBench: Can LLMs Beat Traditional ML Models in Clinical Prediction?

Add code
Nov 10, 2024
Viaarxiv icon

Position Paper On Diagnostic Uncertainty Estimation from Large Language Models: Next-Word Probability Is Not Pre-test Probability

Add code
Nov 07, 2024
Viaarxiv icon

Mapping Bias in Vision Language Models: Signposts, Pitfalls, and the Road Ahead

Add code
Oct 17, 2024
Viaarxiv icon

WorldMedQA-V: a multilingual, multimodal medical examination dataset for multimodal language models evaluation

Add code
Oct 16, 2024
Viaarxiv icon

Wait, but Tylenol is Acetaminophen... Investigating and Improving Language Models' Ability to Resist Requests for Misinformation

Add code
Sep 30, 2024
Viaarxiv icon