Picture for Hongxu Yin

Hongxu Yin

MaskLLM: Learnable Semi-Structured Sparsity for Large Language Models

Add code
Sep 26, 2024
Viaarxiv icon

VILA-U: a Unified Foundation Model Integrating Visual Understanding and Generation

Add code
Sep 06, 2024
Figure 1 for VILA-U: a Unified Foundation Model Integrating Visual Understanding and Generation
Figure 2 for VILA-U: a Unified Foundation Model Integrating Visual Understanding and Generation
Figure 3 for VILA-U: a Unified Foundation Model Integrating Visual Understanding and Generation
Figure 4 for VILA-U: a Unified Foundation Model Integrating Visual Understanding and Generation
Viaarxiv icon

Eagle: Exploring The Design Space for Multimodal LLMs with Mixture of Encoders

Add code
Aug 28, 2024
Figure 1 for Eagle: Exploring The Design Space for Multimodal LLMs with Mixture of Encoders
Figure 2 for Eagle: Exploring The Design Space for Multimodal LLMs with Mixture of Encoders
Figure 3 for Eagle: Exploring The Design Space for Multimodal LLMs with Mixture of Encoders
Figure 4 for Eagle: Exploring The Design Space for Multimodal LLMs with Mixture of Encoders
Viaarxiv icon

LongVILA: Scaling Long-Context Visual Language Models for Long Videos

Add code
Aug 21, 2024
Figure 1 for LongVILA: Scaling Long-Context Visual Language Models for Long Videos
Figure 2 for LongVILA: Scaling Long-Context Visual Language Models for Long Videos
Figure 3 for LongVILA: Scaling Long-Context Visual Language Models for Long Videos
Figure 4 for LongVILA: Scaling Long-Context Visual Language Models for Long Videos
Viaarxiv icon

$VILA^2$: VILA Augmented VILA

Add code
Jul 24, 2024
Viaarxiv icon

Flextron: Many-in-One Flexible Large Language Model

Add code
Jun 11, 2024
Viaarxiv icon

Step Out and Seek Around: On Warm-Start Training with Incremental Data

Add code
Jun 06, 2024
Viaarxiv icon

SpatialRGPT: Grounded Spatial Reasoning in Vision Language Model

Add code
Jun 03, 2024
Viaarxiv icon

X-VILA: Cross-Modality Alignment for Large Language Model

Add code
May 29, 2024
Figure 1 for X-VILA: Cross-Modality Alignment for Large Language Model
Figure 2 for X-VILA: Cross-Modality Alignment for Large Language Model
Figure 3 for X-VILA: Cross-Modality Alignment for Large Language Model
Figure 4 for X-VILA: Cross-Modality Alignment for Large Language Model
Viaarxiv icon

LITA: Language Instructed Temporal-Localization Assistant

Add code
Mar 27, 2024
Viaarxiv icon