Picture for Michael Saxon

Michael Saxon

VSP: Assessing the dual challenges of perception and reasoning in spatial planning tasks for VLMs

Add code
Jul 02, 2024
Viaarxiv icon

Losing Visual Needles in Image Haystacks: Vision Language Models are Easily Distracted in Short and Long Contexts

Add code
Jun 24, 2024
Viaarxiv icon

TC-Bench: Benchmarking Temporal Compositionality in Text-to-Video and Image-to-Video Generation

Add code
Jun 12, 2024
Viaarxiv icon

Who Evaluates the Evaluations? Objectively Scoring Text-to-Image Prompt Coherence Metrics with T2IScoreScore (TS2)

Add code
Apr 05, 2024
Viaarxiv icon

Lost in Translation? Translation Errors and Challenges for Fair Assessment of Text-to-Image Models on Multilingual Concepts

Add code
Mar 17, 2024
Figure 1 for Lost in Translation? Translation Errors and Challenges for Fair Assessment of Text-to-Image Models on Multilingual Concepts
Figure 2 for Lost in Translation? Translation Errors and Challenges for Fair Assessment of Text-to-Image Models on Multilingual Concepts
Figure 3 for Lost in Translation? Translation Errors and Challenges for Fair Assessment of Text-to-Image Models on Multilingual Concepts
Figure 4 for Lost in Translation? Translation Errors and Challenges for Fair Assessment of Text-to-Image Models on Multilingual Concepts
Viaarxiv icon

Automatically Correcting Large Language Models: Surveying the landscape of diverse self-correction strategies

Add code
Aug 06, 2023
Figure 1 for Automatically Correcting Large Language Models: Surveying the landscape of diverse self-correction strategies
Figure 2 for Automatically Correcting Large Language Models: Surveying the landscape of diverse self-correction strategies
Figure 3 for Automatically Correcting Large Language Models: Surveying the landscape of diverse self-correction strategies
Figure 4 for Automatically Correcting Large Language Models: Surveying the landscape of diverse self-correction strategies
Viaarxiv icon

Multilingual Conceptual Coverage in Text-to-Image Models

Add code
Jun 02, 2023
Figure 1 for Multilingual Conceptual Coverage in Text-to-Image Models
Figure 2 for Multilingual Conceptual Coverage in Text-to-Image Models
Figure 3 for Multilingual Conceptual Coverage in Text-to-Image Models
Figure 4 for Multilingual Conceptual Coverage in Text-to-Image Models
Viaarxiv icon

Let's Think Frame by Frame: Evaluating Video Chain of Thought with Video Infilling and Prediction

Add code
May 23, 2023
Figure 1 for Let's Think Frame by Frame: Evaluating Video Chain of Thought with Video Infilling and Prediction
Figure 2 for Let's Think Frame by Frame: Evaluating Video Chain of Thought with Video Infilling and Prediction
Figure 3 for Let's Think Frame by Frame: Evaluating Video Chain of Thought with Video Infilling and Prediction
Figure 4 for Let's Think Frame by Frame: Evaluating Video Chain of Thought with Video Infilling and Prediction
Viaarxiv icon

Data Augmentation for Diverse Voice Conversion in Noisy Environments

Add code
May 18, 2023
Figure 1 for Data Augmentation for Diverse Voice Conversion in Noisy Environments
Figure 2 for Data Augmentation for Diverse Voice Conversion in Noisy Environments
Figure 3 for Data Augmentation for Diverse Voice Conversion in Noisy Environments
Viaarxiv icon

Visual Chain of Thought: Bridging Logical Gaps with Multimodal Infillings

Add code
May 03, 2023
Figure 1 for Visual Chain of Thought: Bridging Logical Gaps with Multimodal Infillings
Figure 2 for Visual Chain of Thought: Bridging Logical Gaps with Multimodal Infillings
Figure 3 for Visual Chain of Thought: Bridging Logical Gaps with Multimodal Infillings
Figure 4 for Visual Chain of Thought: Bridging Logical Gaps with Multimodal Infillings
Viaarxiv icon