Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Yulong Mao

Multi-Aspect Knowledge Distillation for Language Model with Low-rank Factorization

Apr 03, 2026

Zihe Liu, Yulong Mao, Jinan Xu, Xinrui Peng, Kaiyu Huang

Abstract:Knowledge distillation is an effective technique for pre-trained language model compression. However, existing methods only focus on the knowledge distribution among layers, which may cause the loss of fine-grained information in the alignment process. To address this issue, we introduce the Multi-aspect Knowledge Distillation (MaKD) method, which mimics the self-attention and feed-forward modules in greater depth to capture rich language knowledge information at different aspects. Experimental results demonstrate that MaKD can achieve competitive performance compared with various strong baselines with the same storage parameter budget. In addition, our method also performs well in distilling auto-regressive architecture models.

Via

Access Paper or Ask Questions

DoRA: Enhancing Parameter-Efficient Fine-Tuning with Dynamic Rank Distribution

May 28, 2024

Yulong Mao, Kaiyu Huang, Changhao Guan, Ganglin Bao, Fengran Mo, Jinan Xu

Figure 1 for DoRA: Enhancing Parameter-Efficient Fine-Tuning with Dynamic Rank Distribution

Figure 2 for DoRA: Enhancing Parameter-Efficient Fine-Tuning with Dynamic Rank Distribution

Figure 3 for DoRA: Enhancing Parameter-Efficient Fine-Tuning with Dynamic Rank Distribution

Figure 4 for DoRA: Enhancing Parameter-Efficient Fine-Tuning with Dynamic Rank Distribution

Abstract:Fine-tuning large-scale pre-trained models is inherently a resource-intensive task. While it can enhance the capabilities of the model, it also incurs substantial computational costs, posing challenges to the practical application of downstream tasks. Existing parameter-efficient fine-tuning (PEFT) methods such as Low-Rank Adaptation (LoRA) rely on a bypass framework that ignores the differential parameter budget requirements across weight matrices, which may lead to suboptimal fine-tuning outcomes. To address this issue, we introduce the Dynamic Low-Rank Adaptation (DoRA) method. DoRA decomposes high-rank LoRA layers into structured single-rank components, allowing for dynamic pruning of parameter budget based on their importance to specific tasks during training, which makes the most of the limited parameter budget. Experimental results demonstrate that DoRA can achieve competitive performance compared with LoRA and full model fine-tuning, and outperform various strong baselines with the same storage parameter budget. Our code is available at https://github.com/MIkumikumi0116/DoRA

* Accepted by the main conference of ACL 2024

Via

Access Paper or Ask Questions