Picture for Haoyu Lu

Haoyu Lu

HyperVL: An Efficient and Dynamic Multimodal Large Language Model for Edge Devices

Add code
Dec 16, 2025
Viaarxiv icon

Physics-Constrained Diffusion Reconstruction with Posterior Correction for Quantitative and Fast PET Imaging

Add code
Aug 20, 2025
Figure 1 for Physics-Constrained Diffusion Reconstruction with Posterior Correction for Quantitative and Fast PET Imaging
Figure 2 for Physics-Constrained Diffusion Reconstruction with Posterior Correction for Quantitative and Fast PET Imaging
Figure 3 for Physics-Constrained Diffusion Reconstruction with Posterior Correction for Quantitative and Fast PET Imaging
Figure 4 for Physics-Constrained Diffusion Reconstruction with Posterior Correction for Quantitative and Fast PET Imaging
Viaarxiv icon

Kimi-VL Technical Report

Add code
Apr 10, 2025
Figure 1 for Kimi-VL Technical Report
Figure 2 for Kimi-VL Technical Report
Figure 3 for Kimi-VL Technical Report
Figure 4 for Kimi-VL Technical Report
Viaarxiv icon

Efficient Motion-Aware Video MLLM

Add code
Mar 17, 2025
Viaarxiv icon

R1-Onevision: Advancing Generalized Multimodal Reasoning through Cross-Modal Formalization

Add code
Mar 13, 2025
Viaarxiv icon

Beyond Filtering: Adaptive Image-Text Quality Enhancement for MLLM Pretraining

Add code
Oct 21, 2024
Figure 1 for Beyond Filtering: Adaptive Image-Text Quality Enhancement for MLLM Pretraining
Figure 2 for Beyond Filtering: Adaptive Image-Text Quality Enhancement for MLLM Pretraining
Figure 3 for Beyond Filtering: Adaptive Image-Text Quality Enhancement for MLLM Pretraining
Figure 4 for Beyond Filtering: Adaptive Image-Text Quality Enhancement for MLLM Pretraining
Viaarxiv icon

Exploring the Design Space of Visual Context Representation in Video MLLMs

Add code
Oct 17, 2024
Figure 1 for Exploring the Design Space of Visual Context Representation in Video MLLMs
Figure 2 for Exploring the Design Space of Visual Context Representation in Video MLLMs
Figure 3 for Exploring the Design Space of Visual Context Representation in Video MLLMs
Figure 4 for Exploring the Design Space of Visual Context Representation in Video MLLMs
Viaarxiv icon

Towards Event-oriented Long Video Understanding

Add code
Jun 20, 2024
Figure 1 for Towards Event-oriented Long Video Understanding
Figure 2 for Towards Event-oriented Long Video Understanding
Figure 3 for Towards Event-oriented Long Video Understanding
Figure 4 for Towards Event-oriented Long Video Understanding
Viaarxiv icon

Needle In A Video Haystack: A Scalable Synthetic Framework for Benchmarking Video MLLMs

Add code
Jun 13, 2024
Figure 1 for Needle In A Video Haystack: A Scalable Synthetic Framework for Benchmarking Video MLLMs
Figure 2 for Needle In A Video Haystack: A Scalable Synthetic Framework for Benchmarking Video MLLMs
Figure 3 for Needle In A Video Haystack: A Scalable Synthetic Framework for Benchmarking Video MLLMs
Figure 4 for Needle In A Video Haystack: A Scalable Synthetic Framework for Benchmarking Video MLLMs
Viaarxiv icon

DeepSeek-VL: Towards Real-World Vision-Language Understanding

Add code
Mar 11, 2024
Figure 1 for DeepSeek-VL: Towards Real-World Vision-Language Understanding
Figure 2 for DeepSeek-VL: Towards Real-World Vision-Language Understanding
Figure 3 for DeepSeek-VL: Towards Real-World Vision-Language Understanding
Figure 4 for DeepSeek-VL: Towards Real-World Vision-Language Understanding
Viaarxiv icon