Picture for Tomas Ruiz

Tomas Ruiz

Rethinking Ground Truth: A Case Study on Human Label Variation in MLLM Benchmarking

Add code
Mar 20, 2026
Viaarxiv icon

FlashSampling: Fast and Memory-Efficient Exact Sampling

Add code
Mar 16, 2026
Viaarxiv icon

BoN Appetit Team at LeWiDi-2025: Best-of-N Test-time Scaling Can Not Stomach Annotation Disagreements (Yet)

Add code
Oct 14, 2025
Figure 1 for BoN Appetit Team at LeWiDi-2025: Best-of-N Test-time Scaling Can Not Stomach Annotation Disagreements (Yet)
Figure 2 for BoN Appetit Team at LeWiDi-2025: Best-of-N Test-time Scaling Can Not Stomach Annotation Disagreements (Yet)
Figure 3 for BoN Appetit Team at LeWiDi-2025: Best-of-N Test-time Scaling Can Not Stomach Annotation Disagreements (Yet)
Figure 4 for BoN Appetit Team at LeWiDi-2025: Best-of-N Test-time Scaling Can Not Stomach Annotation Disagreements (Yet)
Viaarxiv icon