Picture for Humphrey Shi

Humphrey Shi

DuoGen: Towards General Purpose Interleaved Multimodal Generation

Add code
Feb 03, 2026
Viaarxiv icon

VibeTensor: System Software for Deep Learning, Fully Generated by AI Agents

Add code
Jan 21, 2026
Viaarxiv icon

SAGE: Training Smart Any-Horizon Agents for Long Video Reasoning with Reinforcement Learning

Add code
Dec 15, 2025
Viaarxiv icon

T2I-Copilot: A Training-Free Multi-Agent Text-to-Image System for Enhanced Prompt Interpretation and Interactive Generation

Add code
Jul 28, 2025
Viaarxiv icon

Distilling Normalizing Flows

Add code
Jun 26, 2025
Viaarxiv icon

Person Recognition at Altitude and Range: Fusion of Face, Body Shape and Gait

Add code
May 07, 2025
Figure 1 for Person Recognition at Altitude and Range: Fusion of Face, Body Shape and Gait
Figure 2 for Person Recognition at Altitude and Range: Fusion of Face, Body Shape and Gait
Figure 3 for Person Recognition at Altitude and Range: Fusion of Face, Body Shape and Gait
Figure 4 for Person Recognition at Altitude and Range: Fusion of Face, Body Shape and Gait
Viaarxiv icon

Generalized Neighborhood Attention: Multi-dimensional Sparse Attention at the Speed of Light

Add code
Apr 23, 2025
Viaarxiv icon

RAGME: Retrieval Augmented Video Generation for Enhanced Motion Realism

Add code
Apr 09, 2025
Figure 1 for RAGME: Retrieval Augmented Video Generation for Enhanced Motion Realism
Figure 2 for RAGME: Retrieval Augmented Video Generation for Enhanced Motion Realism
Figure 3 for RAGME: Retrieval Augmented Video Generation for Enhanced Motion Realism
Figure 4 for RAGME: Retrieval Augmented Video Generation for Enhanced Motion Realism
Viaarxiv icon

Slow-Fast Architecture for Video Multi-Modal Large Language Models

Add code
Apr 02, 2025
Viaarxiv icon

Safe Vision-Language Models via Unsafe Weights Manipulation

Add code
Mar 14, 2025
Figure 1 for Safe Vision-Language Models via Unsafe Weights Manipulation
Figure 2 for Safe Vision-Language Models via Unsafe Weights Manipulation
Figure 3 for Safe Vision-Language Models via Unsafe Weights Manipulation
Figure 4 for Safe Vision-Language Models via Unsafe Weights Manipulation
Viaarxiv icon