Picture for Yanghao Li

Yanghao Li

Accelerating Byzantine-Robust Distributed Learning with Compressed Communication via Double Momentum and Variance Reduction

Add code
Mar 16, 2026
Viaarxiv icon

Cheers: Decoupling Patch Details from Semantic Representations Enables Unified Multimodal Comprehension and Generation

Add code
Mar 13, 2026
Viaarxiv icon

Imagination Helps Visual Reasoning, But Not Yet in Latent Space

Add code
Feb 26, 2026
Viaarxiv icon

MANZANO: A Simple and Scalable Unified Multimodal Model with a Hybrid Vision Tokenizer

Add code
Sep 19, 2025
Figure 1 for MANZANO: A Simple and Scalable Unified Multimodal Model with a Hybrid Vision Tokenizer
Figure 2 for MANZANO: A Simple and Scalable Unified Multimodal Model with a Hybrid Vision Tokenizer
Figure 3 for MANZANO: A Simple and Scalable Unified Multimodal Model with a Hybrid Vision Tokenizer
Figure 4 for MANZANO: A Simple and Scalable Unified Multimodal Model with a Hybrid Vision Tokenizer
Viaarxiv icon

MiniCPM4: Ultra-Efficient LLMs on End Devices

Add code
Jun 09, 2025
Figure 1 for MiniCPM4: Ultra-Efficient LLMs on End Devices
Figure 2 for MiniCPM4: Ultra-Efficient LLMs on End Devices
Figure 3 for MiniCPM4: Ultra-Efficient LLMs on End Devices
Figure 4 for MiniCPM4: Ultra-Efficient LLMs on End Devices
Viaarxiv icon

Improve Vision Language Model Chain-of-thought Reasoning

Add code
Oct 21, 2024
Figure 1 for Improve Vision Language Model Chain-of-thought Reasoning
Figure 2 for Improve Vision Language Model Chain-of-thought Reasoning
Figure 3 for Improve Vision Language Model Chain-of-thought Reasoning
Figure 4 for Improve Vision Language Model Chain-of-thought Reasoning
Viaarxiv icon

Distance between Relevant Information Pieces Causes Bias in Long-Context LLMs

Add code
Oct 18, 2024
Figure 1 for Distance between Relevant Information Pieces Causes Bias in Long-Context LLMs
Figure 2 for Distance between Relevant Information Pieces Causes Bias in Long-Context LLMs
Figure 3 for Distance between Relevant Information Pieces Causes Bias in Long-Context LLMs
Figure 4 for Distance between Relevant Information Pieces Causes Bias in Long-Context LLMs
Viaarxiv icon

MM-Ego: Towards Building Egocentric Multimodal LLMs

Add code
Oct 09, 2024
Figure 1 for MM-Ego: Towards Building Egocentric Multimodal LLMs
Figure 2 for MM-Ego: Towards Building Egocentric Multimodal LLMs
Figure 3 for MM-Ego: Towards Building Egocentric Multimodal LLMs
Figure 4 for MM-Ego: Towards Building Egocentric Multimodal LLMs
Viaarxiv icon

EC-DIT: Scaling Diffusion Transformers with Adaptive Expert-Choice Routing

Add code
Oct 02, 2024
Figure 1 for EC-DIT: Scaling Diffusion Transformers with Adaptive Expert-Choice Routing
Figure 2 for EC-DIT: Scaling Diffusion Transformers with Adaptive Expert-Choice Routing
Figure 3 for EC-DIT: Scaling Diffusion Transformers with Adaptive Expert-Choice Routing
Figure 4 for EC-DIT: Scaling Diffusion Transformers with Adaptive Expert-Choice Routing
Viaarxiv icon

MM1.5: Methods, Analysis & Insights from Multimodal LLM Fine-tuning

Add code
Sep 30, 2024
Figure 1 for MM1.5: Methods, Analysis & Insights from Multimodal LLM Fine-tuning
Figure 2 for MM1.5: Methods, Analysis & Insights from Multimodal LLM Fine-tuning
Figure 3 for MM1.5: Methods, Analysis & Insights from Multimodal LLM Fine-tuning
Figure 4 for MM1.5: Methods, Analysis & Insights from Multimodal LLM Fine-tuning
Viaarxiv icon