Picture for Yan Lu

Yan Lu

Uncertainty-Aware Deep Video Compression with Ensembles

Add code
Mar 28, 2024
Figure 1 for Uncertainty-Aware Deep Video Compression with Ensembles
Figure 2 for Uncertainty-Aware Deep Video Compression with Ensembles
Figure 3 for Uncertainty-Aware Deep Video Compression with Ensembles
Figure 4 for Uncertainty-Aware Deep Video Compression with Ensembles
Viaarxiv icon

RelationVLM: Making Large Vision-Language Models Understand Visual Relations

Add code
Mar 19, 2024
Figure 1 for RelationVLM: Making Large Vision-Language Models Understand Visual Relations
Figure 2 for RelationVLM: Making Large Vision-Language Models Understand Visual Relations
Figure 3 for RelationVLM: Making Large Vision-Language Models Understand Visual Relations
Figure 4 for RelationVLM: Making Large Vision-Language Models Understand Visual Relations
Viaarxiv icon

Neural Video Compression with Feature Modulation

Add code
Feb 29, 2024
Viaarxiv icon

Slot-VLM: SlowFast Slots for Video-Language Modeling

Add code
Feb 20, 2024
Viaarxiv icon

Diffusion Model with Cross Attention as an Inductive Bias for Disentanglement

Add code
Feb 15, 2024
Figure 1 for Diffusion Model with Cross Attention as an Inductive Bias for Disentanglement
Figure 2 for Diffusion Model with Cross Attention as an Inductive Bias for Disentanglement
Figure 3 for Diffusion Model with Cross Attention as an Inductive Bias for Disentanglement
Figure 4 for Diffusion Model with Cross Attention as an Inductive Bias for Disentanglement
Viaarxiv icon

Masked Audio Modeling with CLAP and Multi-Objective Learning

Add code
Jan 29, 2024
Figure 1 for Masked Audio Modeling with CLAP and Multi-Objective Learning
Figure 2 for Masked Audio Modeling with CLAP and Multi-Objective Learning
Figure 3 for Masked Audio Modeling with CLAP and Multi-Objective Learning
Figure 4 for Masked Audio Modeling with CLAP and Multi-Objective Learning
Viaarxiv icon

Retrieval-based Video Language Model for Efficient Long Video Question Answering

Add code
Dec 08, 2023
Figure 1 for Retrieval-based Video Language Model for Efficient Long Video Question Answering
Figure 2 for Retrieval-based Video Language Model for Efficient Long Video Question Answering
Figure 3 for Retrieval-based Video Language Model for Efficient Long Video Question Answering
Figure 4 for Retrieval-based Video Language Model for Efficient Long Video Question Answering
Viaarxiv icon

GUPNet++: Geometry Uncertainty Propagation Network for Monocular 3D Object Detection

Add code
Oct 24, 2023
Figure 1 for GUPNet++: Geometry Uncertainty Propagation Network for Monocular 3D Object Detection
Figure 2 for GUPNet++: Geometry Uncertainty Propagation Network for Monocular 3D Object Detection
Figure 3 for GUPNet++: Geometry Uncertainty Propagation Network for Monocular 3D Object Detection
Figure 4 for GUPNet++: Geometry Uncertainty Propagation Network for Monocular 3D Object Detection
Viaarxiv icon

Low-latency Speech Enhancement via Speech Token Generation

Add code
Oct 20, 2023
Viaarxiv icon

Reinforced UI Instruction Grounding: Towards a Generic UI Task Automation API

Add code
Oct 07, 2023
Viaarxiv icon