Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Manisha Chawla

Department of CSE, IIT Bhilai, India

AdaGradSelect: An adaptive gradient-guided layer selection method for efficient fine-tuning of SLMs

Dec 12, 2025

Anshul Kumar, Gagan Raj Gupta, Manisha Chawla

Abstract:Large Language Models (LLMs) can perform many NLP tasks well, but fully fine-tuning them is expensive and requires a lot of memory. Parameter-Efficient Fine-Tuning (PEFT) methods such as LoRA reduce this cost by adding small low-rank updates to frozen model weights. However, these methods restrict the training to a limited subspace, which can sometimes reduce performance. For Small Language Models (SLMs), where efficiency gains matter even more, we introduce AdaGradSelect, an adaptive method that selects which transformer blocks to update based on gradients. Early observations showed that updating only the transformer blocks with the highest gradient norms can achieve performance close to full fine-tuning. Building on this insight, AdaGradSelect adaptively chooses which blocks to train. It uses a combination of Dirichlet-based sampling, which depends on how frequently blocks were updated in the past, and an epsilon-greedy exploration strategy. This lets the method explore different blocks in early training and gradually focus on the most important ones in later epochs. Experiments show that AdaGradSelect trains about 12 percent faster and uses 35 percent less GPU memory while delivering performance very close to full fine-tuning. On the GSM8K dataset, it outperforms LoRA (rank 256) by about 3 percent on average across models such as Qwen2.5-0.5B, LLaMA3.2-1B, and Phi4-mini-3.8B. It also achieves similar accuracy on the MATH dataset. Overall, AdaGradSelect provides a more effective and resource-efficient alternative to traditional fine-tuning methods.

Via

Access Paper or Ask Questions

PatchAlign:Fair and Accurate Skin Disease Image Classification by Alignment with Clinical Labels

Sep 08, 2024

Aayushman, Hemanth Gaddey, Vidhi Mittal, Manisha Chawla, Gagan Raj Gupta

Figure 1 for PatchAlign:Fair and Accurate Skin Disease Image Classification by Alignment with Clinical Labels

Figure 2 for PatchAlign:Fair and Accurate Skin Disease Image Classification by Alignment with Clinical Labels

Figure 3 for PatchAlign:Fair and Accurate Skin Disease Image Classification by Alignment with Clinical Labels

Figure 4 for PatchAlign:Fair and Accurate Skin Disease Image Classification by Alignment with Clinical Labels

Abstract:Deep learning models have achieved great success in automating skin lesion diagnosis. However, the ethnic disparity in these models' predictions needs to be addressed before deploying them. We introduce a novel approach, PatchAlign, to enhance skin condition image classification accuracy and fairness by aligning with clinical text representations of skin conditions. PatchAlign uses Graph Optimal Transport (GOT) Loss as a regularizer to perform cross-domain alignment. The representations obtained are robust and generalize well across skin tones, even with limited training samples. To reduce the effect of noise and artifacts in clinical dermatology images, we propose a learnable Masked Graph Optimal Transport for cross-domain alignment that further improves fairness metrics. We compare our model to the state-of-the-art FairDisCo on two skin lesion datasets with different skin types: Fitzpatrick17k and Diverse Dermatology Images (DDI). PatchAlign enhances the accuracy of skin condition image classification by 2.8% (in-domain) and 6.2% (out-domain) on Fitzpatrick17k, and 4.2% (in-domain) on DDI compared to FairDisCo. Additionally, it consistently improves the fairness of true positive rates across skin tones. The source code for the implementation is available at the following GitHub repository: https://github.com/aayushmanace/PatchAlign24, enabling easy reproduction and further experimentation.

* MICCAI 2024. Early Accept Paper (amongst the top 11% of 2869 papers submitted)

Via

Access Paper or Ask Questions

Entropy Aware Training for Fast and Accurate Distributed GNN

Nov 04, 2023

Dhruv Deshmukh, Gagan Raj Gupta, Manisha Chawla, Vishwesh Jatala, Anirban Haldar

Figure 1 for Entropy Aware Training for Fast and Accurate Distributed GNN

Figure 2 for Entropy Aware Training for Fast and Accurate Distributed GNN

Figure 3 for Entropy Aware Training for Fast and Accurate Distributed GNN

Figure 4 for Entropy Aware Training for Fast and Accurate Distributed GNN

Abstract:Several distributed frameworks have been developed to scale Graph Neural Networks (GNNs) on billion-size graphs. On several benchmarks, we observe that the graph partitions generated by these frameworks have heterogeneous data distributions and class imbalance, affecting convergence, and resulting in lower performance than centralized implementations. We holistically address these challenges and develop techniques that reduce training time and improve accuracy. We develop an Edge-Weighted partitioning technique to improve the micro average F1 score (accuracy) by minimizing the total entropy. Furthermore, we add an asynchronous personalization phase that adapts each compute-host's model to its local data distribution. We design a class-balanced sampler that considerably speeds up convergence. We implemented our algorithms on the DistDGL framework and observed that our training techniques scale much better than the existing training approach. We achieved a (2-3x) speedup in training time and 4\% improvement on average in micro-F1 scores on 5 large graph benchmarks compared to the standard baselines.

* 8 pages, 3 figures, 5 tables, accepted at ICDM'23

Via

Access Paper or Ask Questions