Picture for Yan Lu

Yan Lu

A General Theory for Compositional Generalization

Add code
May 20, 2024
Viaarxiv icon

Text Grouping Adapter: Adapting Pre-trained Text Detector for Layout Analysis

Add code
May 13, 2024
Figure 1 for Text Grouping Adapter: Adapting Pre-trained Text Detector for Layout Analysis
Figure 2 for Text Grouping Adapter: Adapting Pre-trained Text Detector for Layout Analysis
Figure 3 for Text Grouping Adapter: Adapting Pre-trained Text Detector for Layout Analysis
Figure 4 for Text Grouping Adapter: Adapting Pre-trained Text Detector for Layout Analysis
Viaarxiv icon

Uncertainty-Aware Deep Video Compression with Ensembles

Add code
Mar 28, 2024
Figure 1 for Uncertainty-Aware Deep Video Compression with Ensembles
Figure 2 for Uncertainty-Aware Deep Video Compression with Ensembles
Figure 3 for Uncertainty-Aware Deep Video Compression with Ensembles
Figure 4 for Uncertainty-Aware Deep Video Compression with Ensembles
Viaarxiv icon

RelationVLM: Making Large Vision-Language Models Understand Visual Relations

Add code
Mar 19, 2024
Figure 1 for RelationVLM: Making Large Vision-Language Models Understand Visual Relations
Figure 2 for RelationVLM: Making Large Vision-Language Models Understand Visual Relations
Figure 3 for RelationVLM: Making Large Vision-Language Models Understand Visual Relations
Figure 4 for RelationVLM: Making Large Vision-Language Models Understand Visual Relations
Viaarxiv icon

Neural Video Compression with Feature Modulation

Add code
Feb 29, 2024
Figure 1 for Neural Video Compression with Feature Modulation
Figure 2 for Neural Video Compression with Feature Modulation
Figure 3 for Neural Video Compression with Feature Modulation
Figure 4 for Neural Video Compression with Feature Modulation
Viaarxiv icon

Slot-VLM: SlowFast Slots for Video-Language Modeling

Add code
Feb 20, 2024
Figure 1 for Slot-VLM: SlowFast Slots for Video-Language Modeling
Figure 2 for Slot-VLM: SlowFast Slots for Video-Language Modeling
Figure 3 for Slot-VLM: SlowFast Slots for Video-Language Modeling
Figure 4 for Slot-VLM: SlowFast Slots for Video-Language Modeling
Viaarxiv icon

Diffusion Model with Cross Attention as an Inductive Bias for Disentanglement

Add code
Feb 15, 2024
Figure 1 for Diffusion Model with Cross Attention as an Inductive Bias for Disentanglement
Figure 2 for Diffusion Model with Cross Attention as an Inductive Bias for Disentanglement
Figure 3 for Diffusion Model with Cross Attention as an Inductive Bias for Disentanglement
Figure 4 for Diffusion Model with Cross Attention as an Inductive Bias for Disentanglement
Viaarxiv icon

Masked Audio Modeling with CLAP and Multi-Objective Learning

Add code
Jan 29, 2024
Viaarxiv icon

Retrieval-based Video Language Model for Efficient Long Video Question Answering

Add code
Dec 08, 2023
Figure 1 for Retrieval-based Video Language Model for Efficient Long Video Question Answering
Figure 2 for Retrieval-based Video Language Model for Efficient Long Video Question Answering
Figure 3 for Retrieval-based Video Language Model for Efficient Long Video Question Answering
Figure 4 for Retrieval-based Video Language Model for Efficient Long Video Question Answering
Viaarxiv icon

GUPNet++: Geometry Uncertainty Propagation Network for Monocular 3D Object Detection

Add code
Oct 24, 2023
Figure 1 for GUPNet++: Geometry Uncertainty Propagation Network for Monocular 3D Object Detection
Figure 2 for GUPNet++: Geometry Uncertainty Propagation Network for Monocular 3D Object Detection
Figure 3 for GUPNet++: Geometry Uncertainty Propagation Network for Monocular 3D Object Detection
Figure 4 for GUPNet++: Geometry Uncertainty Propagation Network for Monocular 3D Object Detection
Viaarxiv icon