Picture for Michael S. Ryoo

Michael S. Ryoo

LLaRA: Supercharging Robot Learning Data for Vision-Language Policy

Add code
Jun 28, 2024
Figure 1 for LLaRA: Supercharging Robot Learning Data for Vision-Language Policy
Figure 2 for LLaRA: Supercharging Robot Learning Data for Vision-Language Policy
Figure 3 for LLaRA: Supercharging Robot Learning Data for Vision-Language Policy
Figure 4 for LLaRA: Supercharging Robot Learning Data for Vision-Language Policy
Viaarxiv icon

Too Many Frames, not all Useful:Efficient Strategies for Long-Form Video QA

Add code
Jun 17, 2024
Viaarxiv icon

Learning to Localize Objects Improves Spatial Reasoning in Visual-LLMs

Add code
Apr 11, 2024
Viaarxiv icon

Understanding Long Videos in One Multimodal Language Model Pass

Add code
Mar 25, 2024
Figure 1 for Understanding Long Videos in One Multimodal Language Model Pass
Figure 2 for Understanding Long Videos in One Multimodal Language Model Pass
Figure 3 for Understanding Long Videos in One Multimodal Language Model Pass
Figure 4 for Understanding Long Videos in One Multimodal Language Model Pass
Viaarxiv icon

Language Repository for Long Video Understanding

Add code
Mar 21, 2024
Figure 1 for Language Repository for Long Video Understanding
Figure 2 for Language Repository for Long Video Understanding
Figure 3 for Language Repository for Long Video Understanding
Figure 4 for Language Repository for Long Video Understanding
Viaarxiv icon

Diffusion Illusions: Hiding Images in Plain Sight

Add code
Dec 06, 2023
Figure 1 for Diffusion Illusions: Hiding Images in Plain Sight
Figure 2 for Diffusion Illusions: Hiding Images in Plain Sight
Figure 3 for Diffusion Illusions: Hiding Images in Plain Sight
Figure 4 for Diffusion Illusions: Hiding Images in Plain Sight
Viaarxiv icon

Mirasol3B: A Multimodal Autoregressive model for time-aligned and contextual modalities

Add code
Nov 13, 2023
Figure 1 for Mirasol3B: A Multimodal Autoregressive model for time-aligned and contextual modalities
Figure 2 for Mirasol3B: A Multimodal Autoregressive model for time-aligned and contextual modalities
Figure 3 for Mirasol3B: A Multimodal Autoregressive model for time-aligned and contextual modalities
Figure 4 for Mirasol3B: A Multimodal Autoregressive model for time-aligned and contextual modalities
Viaarxiv icon

AAN: Attributes-Aware Network for Temporal Action Detection

Add code
Sep 01, 2023
Figure 1 for AAN: Attributes-Aware Network for Temporal Action Detection
Figure 2 for AAN: Attributes-Aware Network for Temporal Action Detection
Figure 3 for AAN: Attributes-Aware Network for Temporal Action Detection
Figure 4 for AAN: Attributes-Aware Network for Temporal Action Detection
Viaarxiv icon

Crossway Diffusion: Improving Diffusion-based Visuomotor Policy via Self-supervised Learning

Add code
Jul 04, 2023
Figure 1 for Crossway Diffusion: Improving Diffusion-based Visuomotor Policy via Self-supervised Learning
Figure 2 for Crossway Diffusion: Improving Diffusion-based Visuomotor Policy via Self-supervised Learning
Figure 3 for Crossway Diffusion: Improving Diffusion-based Visuomotor Policy via Self-supervised Learning
Figure 4 for Crossway Diffusion: Improving Diffusion-based Visuomotor Policy via Self-supervised Learning
Viaarxiv icon

Energy-Based Models for Cross-Modal Localization using Convolutional Transformers

Add code
Jun 06, 2023
Figure 1 for Energy-Based Models for Cross-Modal Localization using Convolutional Transformers
Figure 2 for Energy-Based Models for Cross-Modal Localization using Convolutional Transformers
Figure 3 for Energy-Based Models for Cross-Modal Localization using Convolutional Transformers
Figure 4 for Energy-Based Models for Cross-Modal Localization using Convolutional Transformers
Viaarxiv icon