Picture for Humphrey Shi

Humphrey Shi

SAGE: Training Smart Any-Horizon Agents for Long Video Reasoning with Reinforcement Learning

Add code
Dec 15, 2025
Viaarxiv icon

T2I-Copilot: A Training-Free Multi-Agent Text-to-Image System for Enhanced Prompt Interpretation and Interactive Generation

Add code
Jul 28, 2025
Viaarxiv icon

Distilling Normalizing Flows

Add code
Jun 26, 2025
Viaarxiv icon

Person Recognition at Altitude and Range: Fusion of Face, Body Shape and Gait

Add code
May 07, 2025
Figure 1 for Person Recognition at Altitude and Range: Fusion of Face, Body Shape and Gait
Figure 2 for Person Recognition at Altitude and Range: Fusion of Face, Body Shape and Gait
Figure 3 for Person Recognition at Altitude and Range: Fusion of Face, Body Shape and Gait
Figure 4 for Person Recognition at Altitude and Range: Fusion of Face, Body Shape and Gait
Viaarxiv icon

Generalized Neighborhood Attention: Multi-dimensional Sparse Attention at the Speed of Light

Add code
Apr 23, 2025
Viaarxiv icon

RAGME: Retrieval Augmented Video Generation for Enhanced Motion Realism

Add code
Apr 09, 2025
Figure 1 for RAGME: Retrieval Augmented Video Generation for Enhanced Motion Realism
Figure 2 for RAGME: Retrieval Augmented Video Generation for Enhanced Motion Realism
Figure 3 for RAGME: Retrieval Augmented Video Generation for Enhanced Motion Realism
Figure 4 for RAGME: Retrieval Augmented Video Generation for Enhanced Motion Realism
Viaarxiv icon

Slow-Fast Architecture for Video Multi-Modal Large Language Models

Add code
Apr 02, 2025
Viaarxiv icon

Safe Vision-Language Models via Unsafe Weights Manipulation

Add code
Mar 14, 2025
Figure 1 for Safe Vision-Language Models via Unsafe Weights Manipulation
Figure 2 for Safe Vision-Language Models via Unsafe Weights Manipulation
Figure 3 for Safe Vision-Language Models via Unsafe Weights Manipulation
Figure 4 for Safe Vision-Language Models via Unsafe Weights Manipulation
Viaarxiv icon

FlexVAR: Flexible Visual Autoregressive Modeling without Residual Prediction

Add code
Feb 27, 2025
Viaarxiv icon

CLIP-GS: Unifying Vision-Language Representation with 3D Gaussian Splatting

Add code
Dec 26, 2024
Figure 1 for CLIP-GS: Unifying Vision-Language Representation with 3D Gaussian Splatting
Figure 2 for CLIP-GS: Unifying Vision-Language Representation with 3D Gaussian Splatting
Figure 3 for CLIP-GS: Unifying Vision-Language Representation with 3D Gaussian Splatting
Figure 4 for CLIP-GS: Unifying Vision-Language Representation with 3D Gaussian Splatting
Viaarxiv icon