Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:AME: Aligned Manifold Entropy for Robust Vision-Language Distillation

Aug 12, 2025

Guiming Cao, Yuming Ou

Figure 1 for AME: Aligned Manifold Entropy for Robust Vision-Language Distillation

Figure 2 for AME: Aligned Manifold Entropy for Robust Vision-Language Distillation

Figure 3 for AME: Aligned Manifold Entropy for Robust Vision-Language Distillation

Figure 4 for AME: Aligned Manifold Entropy for Robust Vision-Language Distillation

Share this with someone who'll enjoy it:

Abstract:Knowledge distillation is a long-established technique for knowledge transfer, and has regained attention in the context of the recent emergence of large vision-language models (VLMs). However, vision-language knowledge distillation often requires sufficient training data to achieve robust generalization on amples with ambiguous or boundary-adjacent representations, which are associated with high predictive uncertainty. Critically, collecting such large-scale, task-specific data for training is often impractical in real-world scenarios. To address this major challenge arising from the entanglement of uncertainty and cross-modal feature representation, we propose Aligned Manifold Entropy for Robust Vision-Language Distillation (AME), aiming to achieve robust generalization under real-world conditions. AME applies entropy minimization over a reconfigured shared manifold, where multi-modal data (i.e., image and text) are bridged through a pair of projection functions, conducive to structural compression for cross-modal feature representations. This enables robust knowledge distillation under low-data regimes, while requiring no architectural modifications to the backbone. As a result, it can serve as a plug-and-play module compatible with a wide range of vision-language distillation frameworks. Notably, our theoretical analysis reveals that integrating knowledge distillation with entropy minimization over the shared manifold leads to a tighter generalization error bound. Extensive experiments across diverse distillation architectures and training settings demonstrate that AME consistently facilitates robust knowledge distillation, resulting in superior generalization performance across a wide spectrum of downstream tasks.

View paper on

Share this with someone who'll enjoy it:

Title:AME: Aligned Manifold Entropy for Robust Vision-Language Distillation

Paper and Code