Picture for Song Han

Song Han

University of Connecticut

EgoVLA: Learning Vision-Language-Action Models from Egocentric Human Videos

Add code
Jul 16, 2025
Viaarxiv icon

Scaling RL to Long Videos

Add code
Jul 10, 2025
Viaarxiv icon

Locality-aware Parallel Decoding for Efficient Autoregressive Image Generation

Add code
Jul 02, 2025
Viaarxiv icon

Radial Attention: $O(n\log n)$ Sparse Attention with Energy Decay for Long Video Generation

Add code
Jun 24, 2025
Viaarxiv icon

Fast-dLLM: Training-free Acceleration of Diffusion LLM by Enabling KV Cache and Parallel Decoding

Add code
May 28, 2025
Viaarxiv icon

Sparse VideoGen2: Accelerate Video Generation with Sparse Attention via Semantic-Aware Permutation

Add code
May 24, 2025
Viaarxiv icon

Nemotron-H: A Family of Accurate and Efficient Hybrid Mamba-Transformer Models

Add code
Apr 10, 2025
Viaarxiv icon

CoT-VLA: Visual Chain-of-Thought Reasoning for Vision-Language-Action Models

Add code
Mar 27, 2025
Viaarxiv icon

Scaling Vision Pre-Training to 4K Resolution

Add code
Mar 25, 2025
Viaarxiv icon

XAttention: Block Sparse Attention with Antidiagonal Scoring

Add code
Mar 20, 2025
Figure 1 for XAttention: Block Sparse Attention with Antidiagonal Scoring
Figure 2 for XAttention: Block Sparse Attention with Antidiagonal Scoring
Figure 3 for XAttention: Block Sparse Attention with Antidiagonal Scoring
Figure 4 for XAttention: Block Sparse Attention with Antidiagonal Scoring
Viaarxiv icon