Picture for Daniel Tawfik

Daniel Tawfik

Scalably Enhancing the Clinical Validity of a Task Benchmark with Physician Oversight

Add code
Dec 22, 2025
Viaarxiv icon