Picture for Limin Wang

Limin Wang

EgoExoLearn: A Dataset for Bridging Asynchronous Ego- and Exo-centric View of Procedural Activities in Real World

Add code
Mar 24, 2024
Viaarxiv icon

InternVideo2: Scaling Video Foundation Models for Multimodal Video Understanding

Add code
Mar 22, 2024
Viaarxiv icon

Contextual AD Narration with Interleaved Multimodal Sequence

Add code
Mar 19, 2024
Viaarxiv icon

Spatiotemporal Predictive Pre-training for Robotic Motor Control

Add code
Mar 14, 2024
Viaarxiv icon

Video Mamba Suite: State Space Model as a Versatile Alternative for Video Understanding

Add code
Mar 14, 2024
Figure 1 for Video Mamba Suite: State Space Model as a Versatile Alternative for Video Understanding
Figure 2 for Video Mamba Suite: State Space Model as a Versatile Alternative for Video Understanding
Figure 3 for Video Mamba Suite: State Space Model as a Versatile Alternative for Video Understanding
Figure 4 for Video Mamba Suite: State Space Model as a Versatile Alternative for Video Understanding
Viaarxiv icon

VideoMamba: State Space Model for Efficient Video Understanding

Add code
Mar 12, 2024
Viaarxiv icon

StableDrag: Stable Dragging for Point-based Image Editing

Add code
Mar 07, 2024
Figure 1 for StableDrag: Stable Dragging for Point-based Image Editing
Figure 2 for StableDrag: Stable Dragging for Point-based Image Editing
Figure 3 for StableDrag: Stable Dragging for Point-based Image Editing
Figure 4 for StableDrag: Stable Dragging for Point-based Image Editing
Viaarxiv icon

Data-efficient Event Camera Pre-training via Disentangled Masked Modeling

Add code
Mar 01, 2024
Viaarxiv icon

From GPT-4 to Gemini and Beyond: Assessing the Landscape of MLLMs on Generalizability, Trustworthiness and Causality through Four Modalities

Add code
Jan 29, 2024
Figure 1 for From GPT-4 to Gemini and Beyond: Assessing the Landscape of MLLMs on Generalizability, Trustworthiness and Causality through Four Modalities
Figure 2 for From GPT-4 to Gemini and Beyond: Assessing the Landscape of MLLMs on Generalizability, Trustworthiness and Causality through Four Modalities
Figure 3 for From GPT-4 to Gemini and Beyond: Assessing the Landscape of MLLMs on Generalizability, Trustworthiness and Causality through Four Modalities
Figure 4 for From GPT-4 to Gemini and Beyond: Assessing the Landscape of MLLMs on Generalizability, Trustworthiness and Causality through Four Modalities
Viaarxiv icon

Fully Sparse 3D Panoptic Occupancy Prediction

Add code
Dec 29, 2023
Viaarxiv icon