Picture for Hongxu Yin

Hongxu Yin

Celine

EgoVLA: Learning Vision-Language-Action Models from Egocentric Human Videos

Add code
Jul 16, 2025
Viaarxiv icon

Scaling RL to Long Videos

Add code
Jul 10, 2025
Viaarxiv icon

CLIMB: CLustering-based Iterative Data Mixture Bootstrapping for Language Model Pre-training

Add code
Apr 17, 2025
Viaarxiv icon

Scaling Vision Pre-Training to 4K Resolution

Add code
Mar 25, 2025
Viaarxiv icon

Token-Efficient Long Video Understanding for Multimodal LLMs

Add code
Mar 06, 2025
Viaarxiv icon

WorldModelBench: Judging Video Generation Models As World Models

Add code
Feb 28, 2025
Viaarxiv icon

Advancing Weight and Channel Sparsification with Enhanced Saliency

Add code
Feb 05, 2025
Figure 1 for Advancing Weight and Channel Sparsification with Enhanced Saliency
Figure 2 for Advancing Weight and Channel Sparsification with Enhanced Saliency
Figure 3 for Advancing Weight and Channel Sparsification with Enhanced Saliency
Figure 4 for Advancing Weight and Channel Sparsification with Enhanced Saliency
Viaarxiv icon

NVILA: Efficient Frontier Visual Language Models

Add code
Dec 05, 2024
Figure 1 for NVILA: Efficient Frontier Visual Language Models
Figure 2 for NVILA: Efficient Frontier Visual Language Models
Figure 3 for NVILA: Efficient Frontier Visual Language Models
Figure 4 for NVILA: Efficient Frontier Visual Language Models
Viaarxiv icon

NaVILA: Legged Robot Vision-Language-Action Model for Navigation

Add code
Dec 05, 2024
Figure 1 for NaVILA: Legged Robot Vision-Language-Action Model for Navigation
Figure 2 for NaVILA: Legged Robot Vision-Language-Action Model for Navigation
Figure 3 for NaVILA: Legged Robot Vision-Language-Action Model for Navigation
Figure 4 for NaVILA: Legged Robot Vision-Language-Action Model for Navigation
Viaarxiv icon

VILA-M3: Enhancing Vision-Language Models with Medical Expert Knowledge

Add code
Nov 19, 2024
Figure 1 for VILA-M3: Enhancing Vision-Language Models with Medical Expert Knowledge
Figure 2 for VILA-M3: Enhancing Vision-Language Models with Medical Expert Knowledge
Figure 3 for VILA-M3: Enhancing Vision-Language Models with Medical Expert Knowledge
Figure 4 for VILA-M3: Enhancing Vision-Language Models with Medical Expert Knowledge
Viaarxiv icon