Picture for Michael Saxon

Michael Saxon

THOUGHTTERMINATOR: Benchmarking, Calibrating, and Mitigating Overthinking in Reasoning Models

Add code
Apr 17, 2025
Viaarxiv icon

Benchmarks as Microscopes: A Call for Model Metrology

Add code
Jul 22, 2024
Viaarxiv icon

VSP: Assessing the dual challenges of perception and reasoning in spatial planning tasks for VLMs

Add code
Jul 02, 2024
Figure 1 for VSP: Assessing the dual challenges of perception and reasoning in spatial planning tasks for VLMs
Figure 2 for VSP: Assessing the dual challenges of perception and reasoning in spatial planning tasks for VLMs
Figure 3 for VSP: Assessing the dual challenges of perception and reasoning in spatial planning tasks for VLMs
Figure 4 for VSP: Assessing the dual challenges of perception and reasoning in spatial planning tasks for VLMs
Viaarxiv icon

Losing Visual Needles in Image Haystacks: Vision Language Models are Easily Distracted in Short and Long Contexts

Add code
Jun 24, 2024
Viaarxiv icon

TC-Bench: Benchmarking Temporal Compositionality in Text-to-Video and Image-to-Video Generation

Add code
Jun 12, 2024
Viaarxiv icon

Who Evaluates the Evaluations? Objectively Scoring Text-to-Image Prompt Coherence Metrics with T2IScoreScore (TS2)

Add code
Apr 05, 2024
Viaarxiv icon

Lost in Translation? Translation Errors and Challenges for Fair Assessment of Text-to-Image Models on Multilingual Concepts

Add code
Mar 17, 2024
Figure 1 for Lost in Translation? Translation Errors and Challenges for Fair Assessment of Text-to-Image Models on Multilingual Concepts
Figure 2 for Lost in Translation? Translation Errors and Challenges for Fair Assessment of Text-to-Image Models on Multilingual Concepts
Figure 3 for Lost in Translation? Translation Errors and Challenges for Fair Assessment of Text-to-Image Models on Multilingual Concepts
Figure 4 for Lost in Translation? Translation Errors and Challenges for Fair Assessment of Text-to-Image Models on Multilingual Concepts
Viaarxiv icon

Automatically Correcting Large Language Models: Surveying the landscape of diverse self-correction strategies

Add code
Aug 06, 2023
Viaarxiv icon

Multilingual Conceptual Coverage in Text-to-Image Models

Add code
Jun 02, 2023
Viaarxiv icon

Let's Think Frame by Frame: Evaluating Video Chain of Thought with Video Infilling and Prediction

Add code
May 23, 2023
Viaarxiv icon