Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Feiye Huo

AFA-LoRA: Enabling Non-Linear Adaptations in LoRA with Activation Function Annealing

Dec 27, 2025

Jiacheng Li, Jianchao Tan, Zhidong Yang, Feiye Huo, Yerui Sun, Yuchen Xie, Xunliang Cai

Abstract:Low-Rank Adaptation (LoRA) is a widely adopted parameter-efficient fine-tuning (PEFT) method. However, its linear adaptation process limits its expressive power. This means there is a gap between the expressive power of linear training and non-linear training. To bridge this gap, we propose AFA-LoRA, a novel training strategy that brings non-linear expressivity to LoRA while maintaining its seamless mergeability. Our key innovation is an annealed activation function that transitions from a non-linear to a linear transformation during training, allowing the adapter to initially adopt stronger representational capabilities before converging to a mergeable linear form. We implement our method on supervised fine-tuning, reinforcement learning, and speculative decoding. The results show that AFA-LoRA reduces the performance gap between LoRA and full-parameter training. This work enables a more powerful and practical paradigm of parameter-efficient adaptation.

Via

Access Paper or Ask Questions

C2T: A Classifier-Based Tree Construction Method in Speculative Decoding

Feb 19, 2025

Feiye Huo, Jianchao Tan, Kefeng Zhang, Xunliang Cai, Shengli Sun

Figure 1 for C2T: A Classifier-Based Tree Construction Method in Speculative Decoding

Figure 2 for C2T: A Classifier-Based Tree Construction Method in Speculative Decoding

Figure 3 for C2T: A Classifier-Based Tree Construction Method in Speculative Decoding

Figure 4 for C2T: A Classifier-Based Tree Construction Method in Speculative Decoding

Abstract:The growing scale of Large Language Models (LLMs) has exacerbated inference latency and computational costs. Speculative decoding methods, which aim to mitigate these issues, often face inefficiencies in the construction of token trees and the verification of candidate tokens. Existing strategies, including chain mode, static tree, and dynamic tree approaches, have limitations in accurately preparing candidate token trees for verification. We propose a novel method named C2T that adopts a lightweight classifier to generate and prune token trees dynamically. Our classifier considers additional feature variables beyond the commonly used joint probability to predict the confidence score for each draft token to determine whether it is the candidate token for verification. This method outperforms state-of-the-art (SOTA) methods such as EAGLE-2 on multiple benchmarks, by reducing the total number of candidate tokens by 25% while maintaining or even improving the acceptance length.

Via

Access Paper or Ask Questions