Picture for Sifei Liu

Sifei Liu

Temporal Prompting Matters: Rethinking Referring Video Object Segmentation

Add code
Oct 08, 2025
Viaarxiv icon

3D Aware Region Prompted Vision Language Model

Add code
Sep 16, 2025
Viaarxiv icon

EgoVLA: Learning Vision-Language-Action Models from Egocentric Human Videos

Add code
Jul 16, 2025
Viaarxiv icon

Scaling RL to Long Videos

Add code
Jul 10, 2025
Viaarxiv icon

Describe Anything: Detailed Localized Image and Video Captioning

Add code
Apr 22, 2025
Viaarxiv icon

Scaling Vision Pre-Training to 4K Resolution

Add code
Mar 25, 2025
Viaarxiv icon

M3: 3D-Spatial MultiModal Memory

Add code
Mar 20, 2025
Viaarxiv icon

Synthesizing Consistent Novel Views via 3D Epipolar Attention without Re-Training

Add code
Feb 25, 2025
Viaarxiv icon

Parallel Sequence Modeling via Generalized Spatial Propagation Network

Add code
Jan 21, 2025
Figure 1 for Parallel Sequence Modeling via Generalized Spatial Propagation Network
Figure 2 for Parallel Sequence Modeling via Generalized Spatial Propagation Network
Figure 3 for Parallel Sequence Modeling via Generalized Spatial Propagation Network
Figure 4 for Parallel Sequence Modeling via Generalized Spatial Propagation Network
Viaarxiv icon

Omni-RGPT: Unifying Image and Video Region-level Understanding via Token Marks

Add code
Jan 14, 2025
Viaarxiv icon