Picture for Shaogang Gong

Shaogang Gong

CycleCap: Improving VLMs Captioning Performance via Self-Supervised Cycle Consistency Fine-Tuning

Add code
Mar 18, 2026
Viaarxiv icon

LatSearch: Latent Reward-Guided Search for Faster Inference-Time Scaling in Video Diffusion

Add code
Mar 15, 2026
Viaarxiv icon

GraphThinker: Reinforcing Video Reasoning with Event Graph Thinking

Add code
Feb 19, 2026
Viaarxiv icon

OpenUS: A Fully Open-Source Foundation Model for Ultrasound Image Analysis via Self-Adaptive Masked Contrastive Learning

Add code
Nov 14, 2025
Viaarxiv icon

Uncertainty-quantified Rollout Policy Adaptation for Unlabelled Cross-domain Temporal Grounding

Add code
Aug 08, 2025
Viaarxiv icon

ViSMaP: Unsupervised Hour-long Video Summarisation by Meta-Prompting

Add code
Apr 22, 2025
Viaarxiv icon

ViMo: A Generative Visual GUI World Model for App Agent

Add code
Apr 15, 2025
Viaarxiv icon

Multi-modal Multi-platform Person Re-Identification: Benchmark and Method

Add code
Mar 21, 2025
Figure 1 for Multi-modal Multi-platform Person Re-Identification: Benchmark and Method
Figure 2 for Multi-modal Multi-platform Person Re-Identification: Benchmark and Method
Figure 3 for Multi-modal Multi-platform Person Re-Identification: Benchmark and Method
Figure 4 for Multi-modal Multi-platform Person Re-Identification: Benchmark and Method
Viaarxiv icon

Temporal Score Analysis for Understanding and Correcting Diffusion Artifacts

Add code
Mar 20, 2025
Figure 1 for Temporal Score Analysis for Understanding and Correcting Diffusion Artifacts
Figure 2 for Temporal Score Analysis for Understanding and Correcting Diffusion Artifacts
Figure 3 for Temporal Score Analysis for Understanding and Correcting Diffusion Artifacts
Figure 4 for Temporal Score Analysis for Understanding and Correcting Diffusion Artifacts
Viaarxiv icon

V-STaR: Benchmarking Video-LLMs on Video Spatio-Temporal Reasoning

Add code
Mar 14, 2025
Figure 1 for V-STaR: Benchmarking Video-LLMs on Video Spatio-Temporal Reasoning
Figure 2 for V-STaR: Benchmarking Video-LLMs on Video Spatio-Temporal Reasoning
Figure 3 for V-STaR: Benchmarking Video-LLMs on Video Spatio-Temporal Reasoning
Figure 4 for V-STaR: Benchmarking Video-LLMs on Video Spatio-Temporal Reasoning
Viaarxiv icon