Picture for Jiashi Feng

Jiashi Feng

NUS

The Scalability of Simplicity: Empirical Analysis of Vision-Language Learning with a Single Transformer

Add code
Apr 14, 2025
Viaarxiv icon

Pixel-SAIL: Single Transformer For Pixel-Grounded Understanding

Add code
Apr 14, 2025
Viaarxiv icon

Seaweed-7B: Cost-Effective Training of Video Generation Foundation Model

Add code
Apr 11, 2025
Viaarxiv icon

GigaTok: Scaling Visual Tokenizers to 3 Billion Parameters for Autoregressive Image Generation

Add code
Apr 11, 2025
Viaarxiv icon

4th PVUW MeViS 3rd Place Report: Sa2VA

Add code
Apr 01, 2025
Viaarxiv icon

Bridging Continuous and Discrete Tokens for Autoregressive Visual Generation

Add code
Mar 20, 2025
Viaarxiv icon

MagicArticulate: Make Your 3D Models Articulation-Ready

Add code
Feb 18, 2025
Viaarxiv icon

Video Depth Anything: Consistent Depth Estimation for Super-Long Videos

Add code
Jan 21, 2025
Viaarxiv icon

VideoWorld: Exploring Knowledge Learning from Unlabeled Videos

Add code
Jan 16, 2025
Viaarxiv icon

Sa2VA: Marrying SAM2 with LLaVA for Dense Grounded Understanding of Images and Videos

Add code
Jan 07, 2025
Figure 1 for Sa2VA: Marrying SAM2 with LLaVA for Dense Grounded Understanding of Images and Videos
Figure 2 for Sa2VA: Marrying SAM2 with LLaVA for Dense Grounded Understanding of Images and Videos
Figure 3 for Sa2VA: Marrying SAM2 with LLaVA for Dense Grounded Understanding of Images and Videos
Figure 4 for Sa2VA: Marrying SAM2 with LLaVA for Dense Grounded Understanding of Images and Videos
Viaarxiv icon