Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Sheng-Jun Huang

Efficient Code LLM Training via Distribution-Consistent and Diversity-Aware Data Selection

Jul 03, 2025

Weijie Lyu, Sheng-Jun Huang, Xuan Xia

Abstract:Recent advancements in large language models (LLMs) have significantly improved code generation and program comprehension, accelerating the evolution of software engineering. Current methods primarily enhance model performance by leveraging vast amounts of data, focusing on data quantity while often overlooking data quality, thereby reducing training efficiency. To address this, we introduce an approach that utilizes a parametric model for code data selection, aimed at improving both training efficiency and model performance. Our method optimizes the parametric model to ensure distribution consistency and diversity within the selected subset, guaranteeing high-quality data. Experimental results demonstrate that using only 10K samples, our method achieves gains of 2.4% (HumanEval) and 2.3% (MBPP) over 92K full-sampled baseline, outperforming other sampling approaches in both performance and efficiency. This underscores that our method effectively boosts model performance while significantly reducing computational costs.

Via

Access Paper or Ask Questions

Learning to Trust Bellman Updates: Selective State-Adaptive Regularization for Offline RL

May 26, 2025

Qin-Wen Luo, Ming-Kun Xie, Ye-Wen Wang, Sheng-Jun Huang

Abstract:Offline reinforcement learning (RL) aims to learn an effective policy from a static dataset. To alleviate extrapolation errors, existing studies often uniformly regularize the value function or policy updates across all states. However, due to substantial variations in data quality, the fixed regularization strength often leads to a dilemma: Weak regularization strength fails to address extrapolation errors and value overestimation, while strong regularization strength shifts policy learning toward behavior cloning, impeding potential performance enabled by Bellman updates. To address this issue, we propose the selective state-adaptive regularization method for offline RL. Specifically, we introduce state-adaptive regularization coefficients to trust state-level Bellman-driven results, while selectively applying regularization on high-quality actions, aiming to avoid performance degradation caused by tight constraints on low-quality actions. By establishing a connection between the representative value regularization method, CQL, and explicit policy constraint methods, we effectively extend selective state-adaptive regularization to these two mainstream offline RL approaches. Extensive experiments demonstrate that the proposed method significantly outperforms the state-of-the-art approaches in both offline and offline-to-online settings on the D4RL benchmark.

* Accepted to ICML 2025

Via

Access Paper or Ask Questions

Data-efficient LLM Fine-tuning for Code Generation

Apr 17, 2025

Weijie Lv, Xuan Xia, Sheng-Jun Huang

Abstract:Large language models (LLMs) have demonstrated significant potential in code generation tasks. However, there remains a performance gap between open-source and closed-source models. To address this gap, existing approaches typically generate large amounts of synthetic data for fine-tuning, which often leads to inefficient training. In this work, we propose a data selection strategy in order to improve the effectiveness and efficiency of training for code-based LLMs. By prioritizing data complexity and ensuring that the sampled subset aligns with the distribution of the original dataset, our sampling strategy effectively selects high-quality data. Additionally, we optimize the tokenization process through a "dynamic pack" technique, which minimizes padding tokens and reduces computational resource consumption. Experimental results show that when training on 40% of the OSS-Instruct dataset, the DeepSeek-Coder-Base-6.7B model achieves an average performance of 66.9%, surpassing the 66.1% performance with the full dataset. Moreover, training time is reduced from 47 minutes to 34 minutes, and the peak GPU memory decreases from 61.47 GB to 42.72 GB during a single epoch. Similar improvements are observed with the CodeLlama-Python-7B model on the Evol-Instruct dataset. By optimizing both data selection and tokenization, our approach not only improves model performance but also improves training efficiency.

* arXiv admin note: text overlap with arXiv:2408.02193

Via

Access Paper or Ask Questions

Rethinking Epistemic and Aleatoric Uncertainty for Active Open-Set Annotation: An Energy-Based Approach

Feb 27, 2025

Chen-Chen Zong, Sheng-Jun Huang

Figure 1 for Rethinking Epistemic and Aleatoric Uncertainty for Active Open-Set Annotation: An Energy-Based Approach

Figure 2 for Rethinking Epistemic and Aleatoric Uncertainty for Active Open-Set Annotation: An Energy-Based Approach

Figure 3 for Rethinking Epistemic and Aleatoric Uncertainty for Active Open-Set Annotation: An Energy-Based Approach

Figure 4 for Rethinking Epistemic and Aleatoric Uncertainty for Active Open-Set Annotation: An Energy-Based Approach

Abstract:Active learning (AL), which iteratively queries the most informative examples from a large pool of unlabeled candidates for model training, faces significant challenges in the presence of open-set classes. Existing methods either prioritize query examples likely to belong to known classes, indicating low epistemic uncertainty (EU), or focus on querying those with highly uncertain predictions, reflecting high aleatoric uncertainty (AU). However, they both yield suboptimal performance, as low EU corresponds to limited useful information, and closed-set AU metrics for unknown class examples are less meaningful. In this paper, we propose an Energy-based Active Open-set Annotation (EAOA) framework, which effectively integrates EU and AU to achieve superior performance. EAOA features a $(C+1)$-class detector and a target classifier, incorporating an energy-based EU measure and a margin-based energy loss designed for the detector, alongside an energy-based AU measure for the target classifier. Another crucial component is the target-driven adaptive sampling strategy. It first forms a smaller candidate set with low EU scores to ensure closed-set properties, making AU metrics meaningful. Subsequently, examples with high AU scores are queried to form the final query set, with the candidate set size adjusted adaptively. Extensive experiments show that EAOA achieves state-of-the-art performance while maintaining high query precision and low training overhead. The code is available at https://github.com/chenchenzong/EAOA.

* Accepted to CVPR 2025

Via

Access Paper or Ask Questions

StructSR: Refuse Spurious Details in Real-World Image Super-Resolution

Jan 16, 2025

Yachao Li, Dong Liang, Tianyu Ding, Sheng-Jun Huang

Figure 1 for StructSR: Refuse Spurious Details in Real-World Image Super-Resolution

Figure 2 for StructSR: Refuse Spurious Details in Real-World Image Super-Resolution

Figure 3 for StructSR: Refuse Spurious Details in Real-World Image Super-Resolution

Figure 4 for StructSR: Refuse Spurious Details in Real-World Image Super-Resolution

Abstract:Diffusion-based models have shown great promise in real-world image super-resolution (Real-ISR), but often generate content with structural errors and spurious texture details due to the empirical priors and illusions of these models. To address this issue, we introduce StructSR, a simple, effective, and plug-and-play method that enhances structural fidelity and suppresses spurious details for diffusion-based Real-ISR. StructSR operates without the need for additional fine-tuning, external model priors, or high-level semantic knowledge. At its core is the Structure-Aware Screening (SAS) mechanism, which identifies the image with the highest structural similarity to the low-resolution (LR) input in the early inference stage, allowing us to leverage it as a historical structure knowledge to suppress the generation of spurious details. By intervening in the diffusion inference process, StructSR seamlessly integrates with existing diffusion-based Real-ISR models. Our experimental results demonstrate that StructSR significantly improves the fidelity of structure and texture, improving the PSNR and SSIM metrics by an average of 5.27% and 9.36% on a synthetic dataset (DIV2K-Val) and 4.13% and 8.64% on two real-world datasets (RealSR and DRealSR) when integrated with four state-of-the-art diffusion-based Real-ISR methods.

Via

Access Paper or Ask Questions

Optimistic Critic Reconstruction and Constrained Fine-Tuning for General Offline-to-Online RL

Dec 25, 2024

Qin-Wen Luo, Ming-Kun Xie, Ye-Wen Wang, Sheng-Jun Huang

Figure 1 for Optimistic Critic Reconstruction and Constrained Fine-Tuning for General Offline-to-Online RL

Figure 2 for Optimistic Critic Reconstruction and Constrained Fine-Tuning for General Offline-to-Online RL

Figure 3 for Optimistic Critic Reconstruction and Constrained Fine-Tuning for General Offline-to-Online RL

Figure 4 for Optimistic Critic Reconstruction and Constrained Fine-Tuning for General Offline-to-Online RL

Abstract:Offline-to-online (O2O) reinforcement learning (RL) provides an effective means of leveraging an offline pre-trained policy as initialization to improve performance rapidly with limited online interactions. Recent studies often design fine-tuning strategies for a specific offline RL method and cannot perform general O2O learning from any offline method. To deal with this problem, we disclose that there are evaluation and improvement mismatches between the offline dataset and the online environment, which hinders the direct application of pre-trained policies to online fine-tuning. In this paper, we propose to handle these two mismatches simultaneously, which aims to achieve general O2O learning from any offline method to any online method. Before online fine-tuning, we re-evaluate the pessimistic critic trained on the offline dataset in an optimistic way and then calibrate the misaligned critic with the reliable offline actor to avoid erroneous update. After obtaining an optimistic and and aligned critic, we perform constrained fine-tuning to combat distribution shift during online learning. We show empirically that the proposed method can achieve stable and efficient performance improvement on multiple simulated tasks when compared to the state-of-the-art methods.

* Accepted to Neurips 2024

Via

Access Paper or Ask Questions

Context-Based Semantic-Aware Alignment for Semi-Supervised Multi-Label Learning

Dec 25, 2024

Heng-Bo Fan, Ming-Kun Xie, Jia-Hao Xiao, Sheng-Jun Huang

Figure 1 for Context-Based Semantic-Aware Alignment for Semi-Supervised Multi-Label Learning

Figure 2 for Context-Based Semantic-Aware Alignment for Semi-Supervised Multi-Label Learning

Figure 3 for Context-Based Semantic-Aware Alignment for Semi-Supervised Multi-Label Learning

Figure 4 for Context-Based Semantic-Aware Alignment for Semi-Supervised Multi-Label Learning

Abstract:Due to the lack of extensive precisely-annotated multi-label data in real word, semi-supervised multi-label learning (SSMLL) has gradually gained attention. Abundant knowledge embedded in vision-language models (VLMs) pre-trained on large-scale image-text pairs could alleviate the challenge of limited labeled data under SSMLL setting.Despite existing methods based on fine-tuning VLMs have achieved advances in weakly-supervised multi-label learning, they failed to fully leverage the information from labeled data to enhance the learning of unlabeled data. In this paper, we propose a context-based semantic-aware alignment method to solve the SSMLL problem by leveraging the knowledge of VLMs. To address the challenge of handling multiple semantics within an image, we introduce a novel framework design to extract label-specific image features. This design allows us to achieve a more compact alignment between text features and label-specific image features, leading the model to generate high-quality pseudo-labels. To incorporate the model with comprehensive understanding of image, we design a semi-supervised context identification auxiliary task to enhance the feature representation by capturing co-occurrence information. Extensive experiments on multiple benchmark datasets demonstrate the effectiveness of our proposed method.

Via

Access Paper or Ask Questions

Dual-Head Knowledge Distillation: Enhancing Logits Utilization with an Auxiliary Head

Nov 13, 2024

Penghui Yang, Chen-Chen Zong, Sheng-Jun Huang, Lei Feng, Bo An

Figure 1 for Dual-Head Knowledge Distillation: Enhancing Logits Utilization with an Auxiliary Head

Figure 2 for Dual-Head Knowledge Distillation: Enhancing Logits Utilization with an Auxiliary Head

Figure 3 for Dual-Head Knowledge Distillation: Enhancing Logits Utilization with an Auxiliary Head

Figure 4 for Dual-Head Knowledge Distillation: Enhancing Logits Utilization with an Auxiliary Head

Abstract:Traditional knowledge distillation focuses on aligning the student's predicted probabilities with both ground-truth labels and the teacher's predicted probabilities. However, the transition to predicted probabilities from logits would obscure certain indispensable information. To address this issue, it is intuitive to additionally introduce a logit-level loss function as a supplement to the widely used probability-level loss function, for exploiting the latent information of logits. Unfortunately, we empirically find that the amalgamation of the newly introduced logit-level loss and the previous probability-level loss will lead to performance degeneration, even trailing behind the performance of employing either loss in isolation. We attribute this phenomenon to the collapse of the classification head, which is verified by our theoretical analysis based on the neural collapse theory. Specifically, the gradients of the two loss functions exhibit contradictions in the linear classifier yet display no such conflict within the backbone. Drawing from the theoretical analysis, we propose a novel method called dual-head knowledge distillation, which partitions the linear classifier into two classification heads responsible for different losses, thereby preserving the beneficial effects of both losses on the backbone while eliminating adverse influences on the classification head. Extensive experiments validate that our method can effectively exploit the information inside the logits and achieve superior performance against state-of-the-art counterparts.

* Preprint

Via

Access Paper or Ask Questions

Dirichlet-Based Coarse-to-Fine Example Selection For Open-Set Annotation

Sep 26, 2024

Ye-Wen Wang, Chen-Chen Zong, Ming-Kun Xie, Sheng-Jun Huang

Figure 1 for Dirichlet-Based Coarse-to-Fine Example Selection For Open-Set Annotation

Figure 2 for Dirichlet-Based Coarse-to-Fine Example Selection For Open-Set Annotation

Figure 3 for Dirichlet-Based Coarse-to-Fine Example Selection For Open-Set Annotation

Figure 4 for Dirichlet-Based Coarse-to-Fine Example Selection For Open-Set Annotation

Abstract:Active learning (AL) has achieved great success by selecting the most valuable examples from unlabeled data. However, they usually deteriorate in real scenarios where open-set noise gets involved, which is studied as open-set annotation (OSA). In this paper, we owe the deterioration to the unreliable predictions arising from softmax-based translation invariance and propose a Dirichlet-based Coarse-to-Fine Example Selection (DCFS) strategy accordingly. Our method introduces simplex-based evidential deep learning (EDL) to break translation invariance and distinguish known and unknown classes by considering evidence-based data and distribution uncertainty simultaneously. Furthermore, hard known-class examples are identified by model discrepancy generated from two classifier heads, where we amplify and alleviate the model discrepancy respectively for unknown and known classes. Finally, we combine the discrepancy with uncertainties to form a two-stage strategy, selecting the most informative examples from known classes. Extensive experiments on various openness ratio datasets demonstrate that DCFS achieves state-of-art performance.

Via

Access Paper or Ask Questions

CodeACT: Code Adaptive Compute-efficient Tuning Framework for Code LLMs

Aug 05, 2024

Weijie Lv, Xuan Xia, Sheng-Jun Huang

Figure 1 for CodeACT: Code Adaptive Compute-efficient Tuning Framework for Code LLMs

Figure 2 for CodeACT: Code Adaptive Compute-efficient Tuning Framework for Code LLMs

Figure 3 for CodeACT: Code Adaptive Compute-efficient Tuning Framework for Code LLMs

Figure 4 for CodeACT: Code Adaptive Compute-efficient Tuning Framework for Code LLMs

Abstract:Large language models (LLMs) have shown great potential in code-related tasks, yet open-source models lag behind their closed-source counterparts. To bridge this performance gap, existing methods generate vast amounts of synthetic data for fine-tuning, leading to inefficiencies in training. Motivated by the need for more effective and efficient training, we propose the Code Adaptive Compute-efficient Tuning (CodeACT) framework. CodeACT introduces the Complexity and Diversity Aware Sampling (CDAS) method to select high-quality training data based on complexity and diversity, and the Dynamic Pack padding strategy to reduce computational resource usage by minimizing padding tokens during training. Experimental results demonstrate that CodeACT-DeepSeek-Coder-6.7B, fine-tuned on only 40% of the EVOL-Instruct data, achieves an 8.6% performance increase on HumanEval, reduces training time by 78%, and decreases peak GPU memory usage by 27%. These findings underscore CodeACT's ability to enhance the performance and efficiency of open-source models. By optimizing both the data selection and training processes, CodeACT offers a comprehensive approach to improving the capabilities of open-source LLMs while significantly reducing computational requirements, addressing the dual challenges of data quality and training efficiency, and paving the way for more resource-efficient and performant models.

Via

Access Paper or Ask Questions