Picture for Yongming Rao

Yongming Rao

Coarse Correspondence Elicit 3D Spacetime Understanding in Multimodal Language Model

Add code
Aug 01, 2024
Viaarxiv icon

Efficient Inference of Vision Instruction-Following Models with Elastic Cache

Add code
Jul 25, 2024
Viaarxiv icon

X-3D: Explicit 3D Structure Modeling for Point Cloud Recognition

Add code
Apr 23, 2024
Viaarxiv icon

Chain-of-Spot: Interactive Reasoning Improves Large Vision-Language Models

Add code
Mar 21, 2024
Viaarxiv icon

Generative Multimodal Models are In-Context Learners

Add code
Dec 20, 2023
Viaarxiv icon

Sherpa3D: Boosting High-Fidelity Text-to-3D Generation via Coarse 3D Prior

Add code
Dec 11, 2023
Viaarxiv icon

TCOVIS: Temporally Consistent Online Video Instance Segmentation

Add code
Sep 21, 2023
Viaarxiv icon

Take-A-Photo: 3D-to-2D Generative Pre-training of Point Cloud Models

Add code
Jul 27, 2023
Viaarxiv icon

Unleashing Text-to-Image Diffusion Models for Visual Perception

Add code
Mar 03, 2023
Viaarxiv icon

UniPC: A Unified Predictor-Corrector Framework for Fast Sampling of Diffusion Models

Add code
Feb 12, 2023
Viaarxiv icon