Picture for Haoji Zhang

Haoji Zhang

UniVG-R1: Reasoning Guided Universal Visual Grounding with Reinforcement Learning

Add code
May 20, 2025
Viaarxiv icon

Uni-AdaFocus: Spatial-temporal Dynamic Computation for Video Recognition

Add code
Dec 15, 2024
Viaarxiv icon

Ponder & Press: Advancing Visual GUI Agent towards General Computer Control

Add code
Dec 02, 2024
Viaarxiv icon

Self-Calibrated CLIP for Training-Free Open-Vocabulary Segmentation

Add code
Nov 24, 2024
Figure 1 for Self-Calibrated CLIP for Training-Free Open-Vocabulary Segmentation
Figure 2 for Self-Calibrated CLIP for Training-Free Open-Vocabulary Segmentation
Figure 3 for Self-Calibrated CLIP for Training-Free Open-Vocabulary Segmentation
Figure 4 for Self-Calibrated CLIP for Training-Free Open-Vocabulary Segmentation
Viaarxiv icon

Hierarchical Memory for Long Video QA

Add code
Jun 30, 2024
Viaarxiv icon

Flash-VStream: Memory-Based Real-Time Understanding for Long Video Streams

Add code
Jun 12, 2024
Figure 1 for Flash-VStream: Memory-Based Real-Time Understanding for Long Video Streams
Figure 2 for Flash-VStream: Memory-Based Real-Time Understanding for Long Video Streams
Figure 3 for Flash-VStream: Memory-Based Real-Time Understanding for Long Video Streams
Figure 4 for Flash-VStream: Memory-Based Real-Time Understanding for Long Video Streams
Viaarxiv icon

PREIM3D: 3D Consistent Precise Image Attribute Editing from a Single Image

Add code
Apr 20, 2023
Figure 1 for PREIM3D: 3D Consistent Precise Image Attribute Editing from a Single Image
Figure 2 for PREIM3D: 3D Consistent Precise Image Attribute Editing from a Single Image
Figure 3 for PREIM3D: 3D Consistent Precise Image Attribute Editing from a Single Image
Figure 4 for PREIM3D: 3D Consistent Precise Image Attribute Editing from a Single Image
Viaarxiv icon