Picture for Zhiwei Hao

Zhiwei Hao

LightMoE: Reducing Mixture-of-Experts Redundancy through Expert Replacing

Add code
Mar 13, 2026
Viaarxiv icon

Streaming-dLLM: Accelerating Diffusion LLMs via Suffix Pruning and Dynamic Decoding

Add code
Jan 27, 2026
Viaarxiv icon

Low-Precision Training of Large Language Models: Methods, Challenges, and Opportunities

Add code
May 02, 2025
Figure 1 for Low-Precision Training of Large Language Models: Methods, Challenges, and Opportunities
Figure 2 for Low-Precision Training of Large Language Models: Methods, Challenges, and Opportunities
Figure 3 for Low-Precision Training of Large Language Models: Methods, Challenges, and Opportunities
Figure 4 for Low-Precision Training of Large Language Models: Methods, Challenges, and Opportunities
Viaarxiv icon

ADEM-VL: Adaptive and Embedded Fusion for Efficient Vision-Language Tuning

Add code
Oct 23, 2024
Figure 1 for ADEM-VL: Adaptive and Embedded Fusion for Efficient Vision-Language Tuning
Figure 2 for ADEM-VL: Adaptive and Embedded Fusion for Efficient Vision-Language Tuning
Figure 3 for ADEM-VL: Adaptive and Embedded Fusion for Efficient Vision-Language Tuning
Figure 4 for ADEM-VL: Adaptive and Embedded Fusion for Efficient Vision-Language Tuning
Viaarxiv icon

GhostNetV3: Exploring the Training Strategies for Compact Models

Add code
Apr 17, 2024
Figure 1 for GhostNetV3: Exploring the Training Strategies for Compact Models
Figure 2 for GhostNetV3: Exploring the Training Strategies for Compact Models
Figure 3 for GhostNetV3: Exploring the Training Strategies for Compact Models
Figure 4 for GhostNetV3: Exploring the Training Strategies for Compact Models
Viaarxiv icon

SAM-DiffSR: Structure-Modulated Diffusion Model for Image Super-Resolution

Add code
Feb 27, 2024
Viaarxiv icon

Data-efficient Large Vision Models through Sequential Autoregression

Add code
Feb 07, 2024
Figure 1 for Data-efficient Large Vision Models through Sequential Autoregression
Figure 2 for Data-efficient Large Vision Models through Sequential Autoregression
Figure 3 for Data-efficient Large Vision Models through Sequential Autoregression
Figure 4 for Data-efficient Large Vision Models through Sequential Autoregression
Viaarxiv icon

One-for-All: Bridge the Gap Between Heterogeneous Architectures in Knowledge Distillation

Add code
Oct 30, 2023
Figure 1 for One-for-All: Bridge the Gap Between Heterogeneous Architectures in Knowledge Distillation
Figure 2 for One-for-All: Bridge the Gap Between Heterogeneous Architectures in Knowledge Distillation
Figure 3 for One-for-All: Bridge the Gap Between Heterogeneous Architectures in Knowledge Distillation
Figure 4 for One-for-All: Bridge the Gap Between Heterogeneous Architectures in Knowledge Distillation
Viaarxiv icon

DeViT: Decomposing Vision Transformers for Collaborative Inference in Edge Devices

Add code
Sep 10, 2023
Figure 1 for DeViT: Decomposing Vision Transformers for Collaborative Inference in Edge Devices
Figure 2 for DeViT: Decomposing Vision Transformers for Collaborative Inference in Edge Devices
Figure 3 for DeViT: Decomposing Vision Transformers for Collaborative Inference in Edge Devices
Figure 4 for DeViT: Decomposing Vision Transformers for Collaborative Inference in Edge Devices
Viaarxiv icon

VanillaKD: Revisit the Power of Vanilla Knowledge Distillation from Small Scale to Large Scale

Add code
May 25, 2023
Figure 1 for VanillaKD: Revisit the Power of Vanilla Knowledge Distillation from Small Scale to Large Scale
Figure 2 for VanillaKD: Revisit the Power of Vanilla Knowledge Distillation from Small Scale to Large Scale
Figure 3 for VanillaKD: Revisit the Power of Vanilla Knowledge Distillation from Small Scale to Large Scale
Figure 4 for VanillaKD: Revisit the Power of Vanilla Knowledge Distillation from Small Scale to Large Scale
Viaarxiv icon