Picture for Kevin Lin

Kevin Lin

IDOL: Unified Dual-Modal Latent Diffusion for Human-Centric Joint Video-Depth Generation

Add code
Jul 15, 2024
Viaarxiv icon

MMWorld: Towards Multi-discipline Multi-faceted World Model Evaluation in Videos

Add code
Jun 12, 2024
Viaarxiv icon

Motion Consistency Model: Accelerating Video Diffusion with Disentangled Motion-Appearance Distillation

Add code
Jun 11, 2024
Viaarxiv icon

Consistency Policy: Accelerated Visuomotor Policies via Consistency Distillation

Add code
May 13, 2024
Figure 1 for Consistency Policy: Accelerated Visuomotor Policies via Consistency Distillation
Figure 2 for Consistency Policy: Accelerated Visuomotor Policies via Consistency Distillation
Figure 3 for Consistency Policy: Accelerated Visuomotor Policies via Consistency Distillation
Figure 4 for Consistency Policy: Accelerated Visuomotor Policies via Consistency Distillation
Viaarxiv icon

List Items One by One: A New Data Source and Learning Paradigm for Multimodal LLMs

Add code
Apr 25, 2024
Figure 1 for List Items One by One: A New Data Source and Learning Paradigm for Multimodal LLMs
Figure 2 for List Items One by One: A New Data Source and Learning Paradigm for Multimodal LLMs
Figure 3 for List Items One by One: A New Data Source and Learning Paradigm for Multimodal LLMs
Figure 4 for List Items One by One: A New Data Source and Learning Paradigm for Multimodal LLMs
Viaarxiv icon

DROID: A Large-Scale In-The-Wild Robot Manipulation Dataset

Add code
Mar 19, 2024
Figure 1 for DROID: A Large-Scale In-The-Wild Robot Manipulation Dataset
Figure 2 for DROID: A Large-Scale In-The-Wild Robot Manipulation Dataset
Figure 3 for DROID: A Large-Scale In-The-Wild Robot Manipulation Dataset
Figure 4 for DROID: A Large-Scale In-The-Wild Robot Manipulation Dataset
Viaarxiv icon

Diffusion and Multi-Domain Adaptation Methods for Eosinophil Segmentation

Add code
Mar 17, 2024
Figure 1 for Diffusion and Multi-Domain Adaptation Methods for Eosinophil Segmentation
Viaarxiv icon

COSMO: COntrastive Streamlined MultimOdal Model with Interleaved Pre-Training

Add code
Jan 01, 2024
Viaarxiv icon

MM-Narrator: Narrating Long-form Videos with Multimodal In-Context Learning

Add code
Nov 29, 2023
Viaarxiv icon

GPT-4V in Wonderland: Large Multimodal Models for Zero-Shot Smartphone GUI Navigation

Add code
Nov 13, 2023
Viaarxiv icon