Picture for Yutong Bai

Yutong Bai

AlphaOne: Reasoning Models Thinking Slow and Fast at Test Time

Add code
May 30, 2025
Viaarxiv icon

REOrdering Patches Improves Vision Models

Add code
May 29, 2025
Viaarxiv icon

"I Know It When I See It": Mood Spaces for Connecting and Expressing Visual Concepts

Add code
Apr 21, 2025
Viaarxiv icon

Vector Quantized Feature Fields for Fast 3D Semantic Lifting

Add code
Mar 09, 2025
Viaarxiv icon

AV-Odyssey Bench: Can Your Multimodal LLMs Really Understand Audio-Visual Information?

Add code
Dec 03, 2024
Figure 1 for AV-Odyssey Bench: Can Your Multimodal LLMs Really Understand Audio-Visual Information?
Figure 2 for AV-Odyssey Bench: Can Your Multimodal LLMs Really Understand Audio-Visual Information?
Figure 3 for AV-Odyssey Bench: Can Your Multimodal LLMs Really Understand Audio-Visual Information?
Figure 4 for AV-Odyssey Bench: Can Your Multimodal LLMs Really Understand Audio-Visual Information?
Viaarxiv icon

Analyzing The Language of Visual Tokens

Add code
Nov 07, 2024
Figure 1 for Analyzing The Language of Visual Tokens
Figure 2 for Analyzing The Language of Visual Tokens
Figure 3 for Analyzing The Language of Visual Tokens
Figure 4 for Analyzing The Language of Visual Tokens
Viaarxiv icon

Evaluating Multiview Object Consistency in Humans and Image Models

Add code
Sep 10, 2024
Viaarxiv icon

KiVA: Kid-inspired Visual Analogies for Testing Large Multimodal Models

Add code
Jul 25, 2024
Figure 1 for KiVA: Kid-inspired Visual Analogies for Testing Large Multimodal Models
Figure 2 for KiVA: Kid-inspired Visual Analogies for Testing Large Multimodal Models
Figure 3 for KiVA: Kid-inspired Visual Analogies for Testing Large Multimodal Models
Figure 4 for KiVA: Kid-inspired Visual Analogies for Testing Large Multimodal Models
Viaarxiv icon

LLARVA: Vision-Action Instruction Tuning Enhances Robot Learning

Add code
Jun 17, 2024
Figure 1 for LLARVA: Vision-Action Instruction Tuning Enhances Robot Learning
Figure 2 for LLARVA: Vision-Action Instruction Tuning Enhances Robot Learning
Figure 3 for LLARVA: Vision-Action Instruction Tuning Enhances Robot Learning
Figure 4 for LLARVA: Vision-Action Instruction Tuning Enhances Robot Learning
Viaarxiv icon

Finding Visual Task Vectors

Add code
Apr 08, 2024
Viaarxiv icon