Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Son Nguyen

BRIDGE: Budget-aware Reasoning via Intermediate Distillation with Guided Examples

Dec 23, 2025

Xuan-An Le, Minh-Nam Tran, Son Nguyen

Abstract:Distilling knowledge from large proprietary models (e.g., GPT-4) to tiny deployable models (less than 1B parameters) faces a critical capacity-budget trap: the 1000x capacity gap between teachers and students prevents effective direct transfer, while API costs prohibit extensive data collection. We introduce BRIDGE (Budget-Aware Reasoning via Intermediate Distillation), a two-phase framework that resolves these constraints through strategic intermediation and budget asymmetry. In Phase 1, a mid-sized Teacher Assistant (TA; e.g., about 7B) learns from the black-box teacher on a strictly limited subset of data (e.g., 3-5%), selected via a zero-API-cost pipeline that balances entropic difficulty and semantic diversity using only local TA inference. In Phase 2, we exploit this asymmetry-teacher queries are expensive, whereas TA inference is free to amplify supervision: the refined TA generates synthetic rationales for the full dataset to train the tiny student. Crucially, we apply an instruction-tuning curriculum to establish behavioral alignment in the tiny student before transferring reasoning. Our theoretical analysis shows that BRIDGE yields tighter generalization bounds than direct distillation when data is abundant. Experiments across medical, legal, and financial benchmarks demonstrate consistent improvements: BRIDGE delivers student performance gains of 28-41%, closing the capability gap with proprietary teachers by 12-16% while using 10x fewer teacher queries. Notably, BRIDGE defies the conventional cost-performance frontier, surpassing direct distillation baselines that use 100% of the budget while consuming only 5% of the resources.

Via

Access Paper or Ask Questions

Improving Adaptive Moment Optimization via Preconditioner Diagonalization

Feb 11, 2025

Son Nguyen, Bo Liu, Lizhang Chen, Qiang Liu

Figure 1 for Improving Adaptive Moment Optimization via Preconditioner Diagonalization

Figure 2 for Improving Adaptive Moment Optimization via Preconditioner Diagonalization

Figure 3 for Improving Adaptive Moment Optimization via Preconditioner Diagonalization

Figure 4 for Improving Adaptive Moment Optimization via Preconditioner Diagonalization

Abstract:Modern adaptive optimization methods, such as Adam and its variants, have emerged as the most widely used tools in deep learning over recent years. These algorithms offer automatic mechanisms for dynamically adjusting the update step based on estimates of gradient statistics. Compared to traditional algorithms like Stochastic Gradient Descent, these adaptive methods are typically more robust to model scale and hyperparameter tuning. However, the gradient statistics employed by these methods often do not leverage sufficient gradient covariance information, leading to suboptimal updates in certain directions of the parameter space and potentially slower convergence. In this work, we keep track of such covariance statistics in the form of a structured preconditioner matrix. Unlike other works, our approach does not apply direct approximations to estimate this matrix. We instead implement an invertible transformation that maps the preconditioner matrix into a new space where it becomes approximately diagonal. This enables a diagonal approximation of the preconditioner matrix in the transformed space, offering several computational advantages. Empirical results show that our approach can substantially enhance the convergence speed of modern adaptive optimizers. Notably, for large language models like LLaMA, we can achieve a speedup of 2x compared to the baseline Adam. Additionally, our method can be integrated with memory-efficient optimizers like Adafactor to manage computational overhead.

* 19 pages, 13 figures

Via

Access Paper or Ask Questions

Generating Critical Scenarios for Testing Automated Driving Systems

Dec 03, 2024

Trung-Hieu Nguyen, Truong-Giang Vuong, Hong-Nam Duong, Son Nguyen, Hieu Dinh Vo, Toshiaki Aoki, Thu-Trang Nguyen

Figure 1 for Generating Critical Scenarios for Testing Automated Driving Systems

Figure 2 for Generating Critical Scenarios for Testing Automated Driving Systems

Figure 3 for Generating Critical Scenarios for Testing Automated Driving Systems

Figure 4 for Generating Critical Scenarios for Testing Automated Driving Systems

Abstract:Autonomous vehicles (AVs) have demonstrated significant potential in revolutionizing transportation, yet ensuring their safety and reliability remains a critical challenge, especially when exposed to dynamic and unpredictable environments. Real-world testing of an Autonomous Driving System (ADS) is both expensive and risky, making simulation-based testing a preferred approach. In this paper, we propose AVASTRA, a Reinforcement Learning (RL)-based approach to generate realistic critical scenarios for testing ADSs in simulation environments. To capture the complexity of driving scenarios, AVASTRA comprehensively represents the environment by both the internal states of an ADS under-test (e.g., the status of the ADS's core components, speed, or acceleration) and the external states of the surrounding factors in the simulation environment (e.g., weather, traffic flow, or road condition). AVASTRA trains the RL agent to effectively configure the simulation environment that places the AV in dangerous situations and potentially leads it to collisions. We introduce a diverse set of actions that allows the RL agent to systematically configure both environmental conditions and traffic participants. Additionally, based on established safety requirements, we enforce heuristic constraints to ensure the realism and relevance of the generated test scenarios. AVASTRA is evaluated on two popular simulation maps with four different road configurations. Our results show AVASTRA's ability to outperform the state-of-the-art approach by generating 30% to 115% more collision scenarios. Compared to the baseline based on Random Search, AVASTRA achieves up to 275% better performance. These results highlight the effectiveness of AVASTRA in enhancing the safety testing of AVs through realistic comprehensive critical scenario generation.

Via

Access Paper or Ask Questions

Mixed Effects Deep Learning for the interpretable analysis of single cell RNA sequencing data by quantifying and visualizing batch effects

Nov 13, 2024

Aixa X. Andrade, Son Nguyen, Albert Montillo

Figure 1 for Mixed Effects Deep Learning for the interpretable analysis of single cell RNA sequencing data by quantifying and visualizing batch effects

Figure 2 for Mixed Effects Deep Learning for the interpretable analysis of single cell RNA sequencing data by quantifying and visualizing batch effects

Figure 3 for Mixed Effects Deep Learning for the interpretable analysis of single cell RNA sequencing data by quantifying and visualizing batch effects

Figure 4 for Mixed Effects Deep Learning for the interpretable analysis of single cell RNA sequencing data by quantifying and visualizing batch effects

Abstract:Single-cell RNA sequencing (scRNA-seq) data are often confounded by technical or biological batch effects. Existing deep learning models mitigate these effects but often discard batch-specific information, potentially losing valuable biological insights. We propose a Mixed Effects Deep Learning (MEDL) autoencoder framework that separately models batch-invariant (fixed effects) and batch-specific (random effects) components. By decoupling batch-invariant biological states from batch variations, our framework integrates both into predictive models. Our approach also generates 2D visualizations of how the same cell appears across batches, enhancing interpretability. Retaining both fixed and random effect latent spaces improves classification accuracy. We applied our framework to three datasets spanning the cardiovascular system (Healthy Heart), Autism Spectrum Disorder (ASD), and Acute Myeloid Leukemia (AML). With 147 batches in the Healthy Heart dataset, far exceeding typical numbers, we tested our framework's ability to handle many batches. In the ASD dataset, our approach captured donor heterogeneity between autistic and healthy individuals. In the AML dataset, it distinguished donor heterogeneity despite missing cell types and diseased donors exhibiting both healthy and malignant cells. These results highlight our framework's ability to characterize fixed and random effects, enhance batch effect visualization, and improve prediction accuracy across diverse datasets.

* Main manuscript: 29 pages, including 10 figures and 8 tables. Supplemental material: 17 pages

Via

Access Paper or Ask Questions

An Empirical Study on Capability of Large Language Models in Understanding Code Semantics

Jul 04, 2024

Thu-Trang Nguyen, Thanh Trong Vu, Hieu Dinh Vo, Son Nguyen

Abstract:Large Language Models for Code (code LLMs) have demonstrated remarkable performance across various software engineering (SE) tasks, increasing the application of code LLMs in software development. Despite the success of code LLMs, there remain significant concerns about the actual capabilities and reliability of these models, "whether these models really learn the semantics of code from the training data and leverage the learned knowledge to perform the SE tasks". In this paper, we introduce EMPICA, a comprehensive framework designed to systematically and empirically evaluate the capabilities of code LLMs in understanding code semantics. Specifically, EMPICA systematically introduces controlled modifications/transformations into the input code and examines the models' responses. Generally, code LLMs must be robust to semantically equivalent code inputs and be sensitive to non-equivalent ones for all SE tasks. Specifically, for every SE task, given an input code snippet c and its semantic equivalent variants, code LLMs must robustly produce consistent/equivalent outputs while they are expected to generate different outputs for c and its semantic non-equivalent variants. Our experimental results on three representative code understanding tasks, including code summarization, method name prediction, and output prediction, reveal that the robustness and sensitivity of the state-of-the-art code LLMs to code transformations vary significantly across tasks and transformation operators. In addition, the code LLMs exhibit better robustness to the semantic preserving transformations than their sensitivity to the semantic non-preserving transformations. These results highlight a need to enhance the model's capabilities of understanding code semantics, especially the sensitivity property.

Via

Access Paper or Ask Questions

Venomancer: Towards Imperceptible and Target-on-Demand Backdoor Attacks in Federated Learning

Jul 03, 2024

Son Nguyen, Thinh Nguyen, Khoa Doan, Kok-Seng Wong

Figure 1 for Venomancer: Towards Imperceptible and Target-on-Demand Backdoor Attacks in Federated Learning

Figure 2 for Venomancer: Towards Imperceptible and Target-on-Demand Backdoor Attacks in Federated Learning

Figure 3 for Venomancer: Towards Imperceptible and Target-on-Demand Backdoor Attacks in Federated Learning

Figure 4 for Venomancer: Towards Imperceptible and Target-on-Demand Backdoor Attacks in Federated Learning

Abstract:Federated Learning (FL) is a distributed machine learning approach that maintains data privacy by training on decentralized data sources. Similar to centralized machine learning, FL is also susceptible to backdoor attacks. Most backdoor attacks in FL assume a predefined target class and require control over a large number of clients or knowledge of benign clients' information. Furthermore, they are not imperceptible and are easily detected by human inspection due to clear artifacts left on the poison data. To overcome these challenges, we propose Venomancer, an effective backdoor attack that is imperceptible and allows target-on-demand. Specifically, imperceptibility is achieved by using a visual loss function to make the poison data visually indistinguishable from the original data. Target-on-demand property allows the attacker to choose arbitrary target classes via conditional adversarial training. Additionally, experiments showed that the method is robust against state-of-the-art defenses such as Norm Clipping, Weak DP, Krum, and Multi-Krum. The source code is available at https://anonymous.4open.science/r/Venomancer-3426.

Via

Access Paper or Ask Questions

H-Fac: Memory-Efficient Optimization with Factorized Hamiltonian Descent

Jun 17, 2024

Son Nguyen, Lizhang Chen, Bo Liu, Qiang Liu

Figure 1 for H-Fac: Memory-Efficient Optimization with Factorized Hamiltonian Descent

Figure 2 for H-Fac: Memory-Efficient Optimization with Factorized Hamiltonian Descent

Figure 3 for H-Fac: Memory-Efficient Optimization with Factorized Hamiltonian Descent

Figure 4 for H-Fac: Memory-Efficient Optimization with Factorized Hamiltonian Descent

Abstract:In this study, we introduce a novel adaptive optimizer, H-Fac, which incorporates a factorized approach to momentum and scaling parameters. Our algorithm demonstrates competitive performances on both ResNets and Vision Transformers, while achieving sublinear memory costs through the use of rank-1 parameterizations for moment estimators. We develop our algorithms based on principles derived from Hamiltonian dynamics, providing robust theoretical underpinnings. These optimization algorithms are designed to be both straightforward and adaptable, facilitating easy implementation in diverse settings.

* 21 pages, 4 figures

Via

Access Paper or Ask Questions

Instance Segmentation under Occlusions via Location-aware Copy-Paste Data Augmentation

Oct 27, 2023

Son Nguyen, Mikel Lainsa, Hung Dao, Daeyoung Kim, Giang Nguyen

Abstract:Occlusion is a long-standing problem in computer vision, particularly in instance segmentation. ACM MMSports 2023 DeepSportRadar has introduced a dataset that focuses on segmenting human subjects within a basketball context and a specialized evaluation metric for occlusion scenarios. Given the modest size of the dataset and the highly deformable nature of the objects to be segmented, this challenge demands the application of robust data augmentation techniques and wisely-chosen deep learning architectures. Our work (ranked 1st in the competition) first proposes a novel data augmentation technique, capable of generating more training samples with wider distribution. Then, we adopt a new architecture - Hybrid Task Cascade (HTC) framework with CBNetV2 as backbone and MaskIoU head to improve segmentation performance. Furthermore, we employ a Stochastic Weight Averaging (SWA) training strategy to improve the model's generalization. As a result, we achieve a remarkable occlusion score (OM) of 0.533 on the challenge dataset, securing the top-1 position on the leaderboard. Source code is available at this https://github.com/nguyendinhson-kaist/MMSports23-Seg-AutoID.

Via

Access Paper or Ask Questions

Fairness-enhancing mixed effects deep learning improves fairness on in- and out-of-distribution clustered (non-iid) data

Oct 04, 2023

Adam Wang, Son Nguyen, Albert Montillo

Abstract:Traditional deep learning (DL) suffers from two core problems. Firstly, it assumes training samples are independent and identically distributed. However, numerous real-world datasets group samples by shared measurements (e.g., study participants or cells), violating this assumption. In these scenarios, DL can show compromised performance, limited generalization, and interpretability issues, coupled with cluster confounding causing Type 1 and 2 errors. Secondly, models are typically trained for overall accuracy, often neglecting underrepresented groups and introducing biases in crucial areas like loan approvals or determining health insurance rates, such biases can significantly impact one's quality of life. To address both of these challenges simultaneously, we present a mixed effects deep learning (MEDL) framework. MEDL separately quantifies cluster-invariant fixed effects (FE) and cluster-specific random effects (RE) through the introduction of: 1) a cluster adversary which encourages the learning of cluster-invariant FE, 2) a Bayesian neural network which quantifies the RE, and a mixing function combining the FE an RE into a mixed-effect prediction. We marry this MEDL with adversarial debiasing, which promotes equality-of-odds fairness across FE, RE, and ME predictions for fairness-sensitive variables. We evaluated our approach using three datasets: two from census/finance focusing on income classification and one from healthcare predicting hospitalization duration, a regression task. Our framework notably enhances fairness across all sensitive variables-increasing fairness up to 82% for age, 43% for race, 86% for sex, and 27% for marital-status. Besides promoting fairness, our method maintains the robust performance and clarity of MEDL. It's versatile, suitable for various dataset types and tasks, making it broadly applicable. Our GitHub repository houses the implementation.

Via

Access Paper or Ask Questions

ARIST: An Effective API Argument Recommendation Approach

Jun 11, 2023

Son Nguyen, Cuong Tran Manh, Kien T. Tran, Tan M. Nguyen, Thu-Trang Nguyen, Kien-Tuan Ngo, Hieu Dinh Vo

Figure 1 for ARIST: An Effective API Argument Recommendation Approach

Figure 2 for ARIST: An Effective API Argument Recommendation Approach

Figure 3 for ARIST: An Effective API Argument Recommendation Approach

Figure 4 for ARIST: An Effective API Argument Recommendation Approach

Abstract:Learning and remembering to use APIs are difficult. Several techniques have been proposed to assist developers in using APIs. Most existing techniques focus on recommending the right API methods to call, but very few techniques focus on recommending API arguments. In this paper, we propose ARIST, a novel automated argument recommendation approach which suggests arguments by predicting developers' expectations when they define and use API methods. To implement this idea in the recommendation process, ARIST combines program analysis (PA), language models (LMs), and several features specialized for the recommendation task which consider the functionality of formal parameters and the positional information of code elements (e.g., variables or method calls) in the given context. In ARIST, the LMs and the recommending features are used to suggest the promising candidates identified by PA. Meanwhile, PA navigates the LMs and the features working on the set of the valid candidates which satisfy syntax, accessibility, and type-compatibility constraints defined by the programming language in use. Our evaluation on a large dataset of real-world projects shows that ARIST improves the state-of-the-art approach by 19% and 18% in top-1 precision and recall for recommending arguments of frequently-used libraries. For general argument recommendation task, i.e., recommending arguments for every method call, ARIST outperforms the baseline approaches by up to 125% top-1 accuracy. Moreover, for newly-encountered projects, ARIST achieves more than 60% top-3 accuracy when evaluating on a larger dataset. For working/maintaining projects, with a personalized LM to capture developers' coding practice, ARIST can productively rank the expected arguments at the top-1 position in 7/10 requests.

Via

Access Paper or Ask Questions