Picture for Haoyu Lu

Haoyu Lu

Towards Event-oriented Long Video Understanding

Add code
Jun 20, 2024
Figure 1 for Towards Event-oriented Long Video Understanding
Figure 2 for Towards Event-oriented Long Video Understanding
Figure 3 for Towards Event-oriented Long Video Understanding
Figure 4 for Towards Event-oriented Long Video Understanding
Viaarxiv icon

Needle In A Video Haystack: A Scalable Synthetic Framework for Benchmarking Video MLLMs

Add code
Jun 13, 2024
Viaarxiv icon

DeepSeek-VL: Towards Real-World Vision-Language Understanding

Add code
Mar 11, 2024
Figure 1 for DeepSeek-VL: Towards Real-World Vision-Language Understanding
Figure 2 for DeepSeek-VL: Towards Real-World Vision-Language Understanding
Figure 3 for DeepSeek-VL: Towards Real-World Vision-Language Understanding
Figure 4 for DeepSeek-VL: Towards Real-World Vision-Language Understanding
Viaarxiv icon

DeepSeek LLM: Scaling Open-Source Language Models with Longtermism

Add code
Jan 05, 2024
Figure 1 for DeepSeek LLM: Scaling Open-Source Language Models with Longtermism
Figure 2 for DeepSeek LLM: Scaling Open-Source Language Models with Longtermism
Figure 3 for DeepSeek LLM: Scaling Open-Source Language Models with Longtermism
Figure 4 for DeepSeek LLM: Scaling Open-Source Language Models with Longtermism
Viaarxiv icon

speech and noise dual-stream spectrogram refine network with speech distortion loss for robust speech recognition

Add code
May 30, 2023
Figure 1 for speech and noise dual-stream spectrogram refine network with speech distortion loss for robust speech recognition
Figure 2 for speech and noise dual-stream spectrogram refine network with speech distortion loss for robust speech recognition
Figure 3 for speech and noise dual-stream spectrogram refine network with speech distortion loss for robust speech recognition
Figure 4 for speech and noise dual-stream spectrogram refine network with speech distortion loss for robust speech recognition
Viaarxiv icon

VDT: An Empirical Study on Video Diffusion with Transformers

Add code
May 22, 2023
Figure 1 for VDT: An Empirical Study on Video Diffusion with Transformers
Figure 2 for VDT: An Empirical Study on Video Diffusion with Transformers
Figure 3 for VDT: An Empirical Study on Video Diffusion with Transformers
Figure 4 for VDT: An Empirical Study on Video Diffusion with Transformers
Viaarxiv icon

UniAdapter: Unified Parameter-Efficient Transfer Learning for Cross-modal Modeling

Add code
Feb 13, 2023
Figure 1 for UniAdapter: Unified Parameter-Efficient Transfer Learning for Cross-modal Modeling
Figure 2 for UniAdapter: Unified Parameter-Efficient Transfer Learning for Cross-modal Modeling
Figure 3 for UniAdapter: Unified Parameter-Efficient Transfer Learning for Cross-modal Modeling
Figure 4 for UniAdapter: Unified Parameter-Efficient Transfer Learning for Cross-modal Modeling
Viaarxiv icon

Monolingual Recognizers Fusion for Code-switching Speech Recognition

Add code
Nov 02, 2022
Figure 1 for Monolingual Recognizers Fusion for Code-switching Speech Recognition
Figure 2 for Monolingual Recognizers Fusion for Code-switching Speech Recognition
Figure 3 for Monolingual Recognizers Fusion for Code-switching Speech Recognition
Figure 4 for Monolingual Recognizers Fusion for Code-switching Speech Recognition
Viaarxiv icon

LGDN: Language-Guided Denoising Network for Video-Language Modeling

Add code
Oct 03, 2022
Figure 1 for LGDN: Language-Guided Denoising Network for Video-Language Modeling
Figure 2 for LGDN: Language-Guided Denoising Network for Video-Language Modeling
Figure 3 for LGDN: Language-Guided Denoising Network for Video-Language Modeling
Figure 4 for LGDN: Language-Guided Denoising Network for Video-Language Modeling
Viaarxiv icon

Multimodal foundation models are better simulators of the human brain

Add code
Aug 17, 2022
Figure 1 for Multimodal foundation models are better simulators of the human brain
Figure 2 for Multimodal foundation models are better simulators of the human brain
Figure 3 for Multimodal foundation models are better simulators of the human brain
Figure 4 for Multimodal foundation models are better simulators of the human brain
Viaarxiv icon