Alert button
Picture for Yan Lu

Yan Lu

Alert button

RelationVLM: Making Large Vision-Language Models Understand Visual Relations

Mar 19, 2024
Zhipeng Huang, Zhizheng Zhang, Zheng-Jun Zha, Yan Lu, Baining Guo

Viaarxiv icon

Neural Video Compression with Feature Modulation

Feb 29, 2024
Jiahao Li, Bin Li, Yan Lu

Viaarxiv icon

Slot-VLM: SlowFast Slots for Video-Language Modeling

Feb 20, 2024
Jiaqi Xu, Cuiling Lan, Wenxuan Xie, Xuejin Chen, Yan Lu

Viaarxiv icon

Diffusion Model with Cross Attention as an Inductive Bias for Disentanglement

Feb 15, 2024
Tao Yang, Cuiling Lan, Yan Lu, Nanning zheng

Viaarxiv icon

Masked Audio Modeling with CLAP and Multi-Objective Learning

Jan 29, 2024
Yifei Xin, Xiulian Peng, Yan Lu

Viaarxiv icon

Retrieval-based Video Language Model for Efficient Long Video Question Answering

Dec 08, 2023
Jiaqi Xu, Cuiling Lan, Wenxuan Xie, Xuejin Chen, Yan Lu

Figure 1 for Retrieval-based Video Language Model for Efficient Long Video Question Answering
Figure 2 for Retrieval-based Video Language Model for Efficient Long Video Question Answering
Figure 3 for Retrieval-based Video Language Model for Efficient Long Video Question Answering
Figure 4 for Retrieval-based Video Language Model for Efficient Long Video Question Answering
Viaarxiv icon

GUPNet++: Geometry Uncertainty Propagation Network for Monocular 3D Object Detection

Oct 24, 2023
Yan Lu, Xinzhu Ma, Lei Yang, Tianzhu Zhang, Yating Liu, Qi Chu, Tong He, Yonghui Li, Wanli Ouyang

Figure 1 for GUPNet++: Geometry Uncertainty Propagation Network for Monocular 3D Object Detection
Figure 2 for GUPNet++: Geometry Uncertainty Propagation Network for Monocular 3D Object Detection
Figure 3 for GUPNet++: Geometry Uncertainty Propagation Network for Monocular 3D Object Detection
Figure 4 for GUPNet++: Geometry Uncertainty Propagation Network for Monocular 3D Object Detection
Viaarxiv icon

Low-latency Speech Enhancement via Speech Token Generation

Oct 20, 2023
Huaying Xue, Xiulian Peng, Yan Lu

Figure 1 for Low-latency Speech Enhancement via Speech Token Generation
Figure 2 for Low-latency Speech Enhancement via Speech Token Generation
Figure 3 for Low-latency Speech Enhancement via Speech Token Generation
Figure 4 for Low-latency Speech Enhancement via Speech Token Generation
Viaarxiv icon

Reinforced UI Instruction Grounding: Towards a Generic UI Task Automation API

Oct 07, 2023
Zhizheng Zhang, Wenxuan Xie, Xiaoyi Zhang, Yan Lu

Figure 1 for Reinforced UI Instruction Grounding: Towards a Generic UI Task Automation API
Figure 2 for Reinforced UI Instruction Grounding: Towards a Generic UI Task Automation API
Figure 3 for Reinforced UI Instruction Grounding: Towards a Generic UI Task Automation API
Figure 4 for Reinforced UI Instruction Grounding: Towards a Generic UI Task Automation API
Viaarxiv icon