Picture for Huchuan Lu

Huchuan Lu

Towards Physically Plausible Video Generation via VLM Planning

Add code
Mar 30, 2025
Viaarxiv icon

Mono2Stereo: A Benchmark and Empirical Study for Stereo Conversion

Add code
Mar 28, 2025
Viaarxiv icon

IDEA: Inverted Text with Cooperative Deformable Aggregation for Multi-modal Object Re-Identification

Add code
Mar 13, 2025
Viaarxiv icon

CineMaster: A 3D-Aware and Controllable Framework for Cinematic Text-to-Video Generation

Add code
Feb 12, 2025
Viaarxiv icon

EVEv2: Improved Baselines for Encoder-Free Vision-Language Models

Add code
Feb 10, 2025
Figure 1 for EVEv2: Improved Baselines for Encoder-Free Vision-Language Models
Figure 2 for EVEv2: Improved Baselines for Encoder-Free Vision-Language Models
Figure 3 for EVEv2: Improved Baselines for Encoder-Free Vision-Language Models
Figure 4 for EVEv2: Improved Baselines for Encoder-Free Vision-Language Models
Viaarxiv icon

KARST: Multi-Kernel Kronecker Adaptation with Re-Scaling Transmission for Visual Classification

Add code
Feb 10, 2025
Figure 1 for KARST: Multi-Kernel Kronecker Adaptation with Re-Scaling Transmission for Visual Classification
Figure 2 for KARST: Multi-Kernel Kronecker Adaptation with Re-Scaling Transmission for Visual Classification
Figure 3 for KARST: Multi-Kernel Kronecker Adaptation with Re-Scaling Transmission for Visual Classification
Figure 4 for KARST: Multi-Kernel Kronecker Adaptation with Re-Scaling Transmission for Visual Classification
Viaarxiv icon

The Devil is in Temporal Token: High Quality Video Reasoning Segmentation

Add code
Jan 15, 2025
Figure 1 for The Devil is in Temporal Token: High Quality Video Reasoning Segmentation
Figure 2 for The Devil is in Temporal Token: High Quality Video Reasoning Segmentation
Figure 3 for The Devil is in Temporal Token: High Quality Video Reasoning Segmentation
Figure 4 for The Devil is in Temporal Token: High Quality Video Reasoning Segmentation
Viaarxiv icon

AVS-Mamba: Exploring Temporal and Multi-modal Mamba for Audio-Visual Segmentation

Add code
Jan 14, 2025
Figure 1 for AVS-Mamba: Exploring Temporal and Multi-modal Mamba for Audio-Visual Segmentation
Figure 2 for AVS-Mamba: Exploring Temporal and Multi-modal Mamba for Audio-Visual Segmentation
Figure 3 for AVS-Mamba: Exploring Temporal and Multi-modal Mamba for Audio-Visual Segmentation
Figure 4 for AVS-Mamba: Exploring Temporal and Multi-modal Mamba for Audio-Visual Segmentation
Viaarxiv icon

Learning Motion and Temporal Cues for Unsupervised Video Object Segmentation

Add code
Jan 14, 2025
Figure 1 for Learning Motion and Temporal Cues for Unsupervised Video Object Segmentation
Figure 2 for Learning Motion and Temporal Cues for Unsupervised Video Object Segmentation
Figure 3 for Learning Motion and Temporal Cues for Unsupervised Video Object Segmentation
Figure 4 for Learning Motion and Temporal Cues for Unsupervised Video Object Segmentation
Viaarxiv icon

3UR-LLM: An End-to-End Multimodal Large Language Model for 3D Scene Understanding

Add code
Jan 14, 2025
Viaarxiv icon