Picture for Minghang Zheng

Minghang Zheng

Scalable Object Relation Encoding for Better 3D Spatial Reasoning in Large Language Models

Add code
Mar 25, 2026
Viaarxiv icon

Hierarchical Event Memory for Accurate and Low-latency Online Video Temporal Grounding

Add code
Aug 06, 2025
Viaarxiv icon

Training-free Video Temporal Grounding using Large-scale Pre-trained Models

Add code
Aug 29, 2024
Viaarxiv icon

ResVG: Enhancing Relation and Semantic Understanding in Multiple Instances for Visual Grounding

Add code
Aug 29, 2024
Viaarxiv icon

Diff-BGM: A Diffusion Model for Video Background Music Generation

Add code
May 20, 2024
Figure 1 for Diff-BGM: A Diffusion Model for Video Background Music Generation
Figure 2 for Diff-BGM: A Diffusion Model for Video Background Music Generation
Figure 3 for Diff-BGM: A Diffusion Model for Video Background Music Generation
Figure 4 for Diff-BGM: A Diffusion Model for Video Background Music Generation
Viaarxiv icon

Team PKU-WICT-MIPL PIC Makeup Temporal Video Grounding Challenge 2022 Technical Report

Add code
Jul 06, 2022
Figure 1 for Team PKU-WICT-MIPL PIC Makeup Temporal Video Grounding Challenge 2022 Technical Report
Figure 2 for Team PKU-WICT-MIPL PIC Makeup Temporal Video Grounding Challenge 2022 Technical Report
Viaarxiv icon

Fast Convergence of DETR with Spatially Modulated Co-Attention

Add code
Aug 05, 2021
Figure 1 for Fast Convergence of DETR with Spatially Modulated Co-Attention
Figure 2 for Fast Convergence of DETR with Spatially Modulated Co-Attention
Figure 3 for Fast Convergence of DETR with Spatially Modulated Co-Attention
Figure 4 for Fast Convergence of DETR with Spatially Modulated Co-Attention
Viaarxiv icon

End-to-End Object Detection with Adaptive Clustering Transformer

Add code
Nov 18, 2020
Figure 1 for End-to-End Object Detection with Adaptive Clustering Transformer
Figure 2 for End-to-End Object Detection with Adaptive Clustering Transformer
Figure 3 for End-to-End Object Detection with Adaptive Clustering Transformer
Figure 4 for End-to-End Object Detection with Adaptive Clustering Transformer
Viaarxiv icon