Picture for Ming-Hsuan Yang

Ming-Hsuan Yang

VideoPrism: A Foundational Visual Encoder for Video Understanding

Add code
Feb 20, 2024
Figure 1 for VideoPrism: A Foundational Visual Encoder for Video Understanding
Figure 2 for VideoPrism: A Foundational Visual Encoder for Video Understanding
Figure 3 for VideoPrism: A Foundational Visual Encoder for Video Understanding
Figure 4 for VideoPrism: A Foundational Visual Encoder for Video Understanding
Viaarxiv icon

Training Class-Imbalanced Diffusion Model Via Overlap Optimization

Add code
Feb 16, 2024
Figure 1 for Training Class-Imbalanced Diffusion Model Via Overlap Optimization
Figure 2 for Training Class-Imbalanced Diffusion Model Via Overlap Optimization
Figure 3 for Training Class-Imbalanced Diffusion Model Via Overlap Optimization
Figure 4 for Training Class-Imbalanced Diffusion Model Via Overlap Optimization
Viaarxiv icon

GALA3D: Towards Text-to-3D Complex Scene Generation via Layout-guided Generative Gaussian Splatting

Add code
Feb 11, 2024
Figure 1 for GALA3D: Towards Text-to-3D Complex Scene Generation via Layout-guided Generative Gaussian Splatting
Figure 2 for GALA3D: Towards Text-to-3D Complex Scene Generation via Layout-guided Generative Gaussian Splatting
Figure 3 for GALA3D: Towards Text-to-3D Complex Scene Generation via Layout-guided Generative Gaussian Splatting
Figure 4 for GALA3D: Towards Text-to-3D Complex Scene Generation via Layout-guided Generative Gaussian Splatting
Viaarxiv icon

PromptRR: Diffusion Models as Prompt Generators for Single Image Reflection Removal

Add code
Feb 04, 2024
Viaarxiv icon

Generalizable Entity Grounding via Assistance of Large Language Model

Add code
Feb 04, 2024
Figure 1 for Generalizable Entity Grounding via Assistance of Large Language Model
Figure 2 for Generalizable Entity Grounding via Assistance of Large Language Model
Figure 3 for Generalizable Entity Grounding via Assistance of Large Language Model
Figure 4 for Generalizable Entity Grounding via Assistance of Large Language Model
Viaarxiv icon

RAP-SAM: Towards Real-Time All-Purpose Segment Anything

Add code
Jan 18, 2024
Viaarxiv icon

Video-GroundingDINO: Towards Open-Vocabulary Spatio-Temporal Video Grounding

Add code
Dec 31, 2023
Viaarxiv icon

VideoPoet: A Large Language Model for Zero-Shot Video Generation

Add code
Dec 21, 2023
Figure 1 for VideoPoet: A Large Language Model for Zero-Shot Video Generation
Figure 2 for VideoPoet: A Large Language Model for Zero-Shot Video Generation
Figure 3 for VideoPoet: A Large Language Model for Zero-Shot Video Generation
Figure 4 for VideoPoet: A Large Language Model for Zero-Shot Video Generation
Viaarxiv icon

VidToMe: Video Token Merging for Zero-Shot Video Editing

Add code
Dec 19, 2023
Viaarxiv icon

DrivingGaussian: Composite Gaussian Splatting for Surrounding Dynamic Autonomous Driving Scenes

Add code
Dec 13, 2023
Figure 1 for DrivingGaussian: Composite Gaussian Splatting for Surrounding Dynamic Autonomous Driving Scenes
Figure 2 for DrivingGaussian: Composite Gaussian Splatting for Surrounding Dynamic Autonomous Driving Scenes
Figure 3 for DrivingGaussian: Composite Gaussian Splatting for Surrounding Dynamic Autonomous Driving Scenes
Figure 4 for DrivingGaussian: Composite Gaussian Splatting for Surrounding Dynamic Autonomous Driving Scenes
Viaarxiv icon