Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Md Rafiqul Islam Rabin

Unlearning Trojans in Large Language Models: A Comparison Between Natural Language and Source Code

Aug 22, 2024

Mahdi Kazemi, Aftab Hussain, Md Rafiqul Islam Rabin, Mohammad Amin Alipour, Sen Lin

Abstract:This work investigates the application of Machine Unlearning (MU) for mitigating the impact of trojans embedded in conventional large language models of natural language (Text-LLMs) and large language models of code (Code-LLMs) We propose a novel unlearning approach, LYA, that leverages both gradient ascent and elastic weight consolidation, a Fisher Information Matrix (FIM) based regularization technique, to unlearn trojans from poisoned models. We compare the effectiveness of LYA against conventional techniques like fine-tuning, retraining, and vanilla gradient ascent. The subject models we investigate are BERT and CodeBERT, for sentiment analysis and code defect detection tasks, respectively. Our findings demonstrate that the combination of gradient ascent and FIM-based regularization, as done in LYA, outperforms existing methods in removing the trojan's influence from the poisoned model, while preserving its original functionality. To the best of our knowledge, this is the first work that compares and contrasts MU of trojans in LLMs, in the NL and Coding domain.

Via

Access Paper or Ask Questions

Harnessing the Power of LLMs: Automating Unit Test Generation for High-Performance Computing

Jul 06, 2024

Rabimba Karanjai, Aftab Hussain, Md Rafiqul Islam Rabin, Lei Xu, Weidong Shi, Mohammad Amin Alipour

Figure 1 for Harnessing the Power of LLMs: Automating Unit Test Generation for High-Performance Computing

Figure 2 for Harnessing the Power of LLMs: Automating Unit Test Generation for High-Performance Computing

Figure 3 for Harnessing the Power of LLMs: Automating Unit Test Generation for High-Performance Computing

Figure 4 for Harnessing the Power of LLMs: Automating Unit Test Generation for High-Performance Computing

Abstract:Unit testing is crucial in software engineering for ensuring quality. However, it's not widely used in parallel and high-performance computing software, particularly scientific applications, due to their smaller, diverse user base and complex logic. These factors make unit testing challenging and expensive, as it requires specialized knowledge and existing automated tools are often ineffective. To address this, we propose an automated method for generating unit tests for such software, considering their unique features like complex logic and parallel processing. Recently, large language models (LLMs) have shown promise in coding and testing. We explored the capabilities of Davinci (text-davinci-002) and ChatGPT (gpt-3.5-turbo) in creating unit tests for C++ parallel programs. Our results show that LLMs can generate mostly correct and comprehensive unit tests, although they have some limitations, such as repetitive assertions and blank test cases.

Via

Access Paper or Ask Questions

Trojans in Large Language Models of Code: A Critical Review through a Trigger-Based Taxonomy

May 05, 2024

Aftab Hussain, Md Rafiqul Islam Rabin, Toufique Ahmed, Bowen Xu, Premkumar Devanbu, Mohammad Amin Alipour

Figure 1 for Trojans in Large Language Models of Code: A Critical Review through a Trigger-Based Taxonomy

Figure 2 for Trojans in Large Language Models of Code: A Critical Review through a Trigger-Based Taxonomy

Figure 3 for Trojans in Large Language Models of Code: A Critical Review through a Trigger-Based Taxonomy

Figure 4 for Trojans in Large Language Models of Code: A Critical Review through a Trigger-Based Taxonomy

Abstract:Large language models (LLMs) have provided a lot of exciting new capabilities in software development. However, the opaque nature of these models makes them difficult to reason about and inspect. Their opacity gives rise to potential security risks, as adversaries can train and deploy compromised models to disrupt the software development process in the victims' organization. This work presents an overview of the current state-of-the-art trojan attacks on large language models of code, with a focus on triggers -- the main design point of trojans -- with the aid of a novel unifying trigger taxonomy framework. We also aim to provide a uniform definition of the fundamental concepts in the area of trojans in Code LLMs. Finally, we draw implications of findings on how code models learn on trigger design.

* arXiv admin note: substantial text overlap with arXiv:2305.03803

Via

Access Paper or Ask Questions

On Trojan Signatures in Large Language Models of Code

Mar 07, 2024

Aftab Hussain, Md Rafiqul Islam Rabin, Mohammad Amin Alipour

Abstract:Trojan signatures, as described by Fields et al. (2021), are noticeable differences in the distribution of the trojaned class parameters (weights) and the non-trojaned class parameters of the trojaned model, that can be used to detect the trojaned model. Fields et al. (2021) found trojan signatures in computer vision classification tasks with image models, such as, Resnet, WideResnet, Densenet, and VGG. In this paper, we investigate such signatures in the classifier layer parameters of large language models of source code. Our results suggest that trojan signatures could not generalize to LLMs of code. We found that trojaned code models are stubborn, even when the models were poisoned under more explicit settings (finetuned with pre-trained weights frozen). We analyzed nine trojaned models for two binary classification tasks: clone and defect detection. To the best of our knowledge, this is the first work to examine weight-based trojan signature revelation techniques for large-language models of code and furthermore to demonstrate that detecting trojans only from the weights in such models is a hard problem.

* This work has been accepted at the International Conference on Learning Representations 2024 Workshop on Secure and Trustworthy Large Language Models, SeT LLM @ ICLR 2024 (Vienna, Austria)

Via

Access Paper or Ask Questions

Quality and Trust in LLM-generated Code

Feb 09, 2024

Claudio Spiess, David Gros, Kunal Suresh Pai, Michael Pradel, Md Rafiqul Islam Rabin, Amin Alipour, Susmit Jha, Prem Devanbu, Toufique Ahmed

Figure 1 for Quality and Trust in LLM-generated Code

Figure 2 for Quality and Trust in LLM-generated Code

Figure 3 for Quality and Trust in LLM-generated Code

Figure 4 for Quality and Trust in LLM-generated Code

Abstract:Machine learning models are widely used but can also often be wrong. Users would benefit from a reliable indication of whether a given output from a given model should be trusted, so a rational decision can be made whether to use the output or not. For example, outputs can be associated with a confidence measure; if this confidence measure is strongly associated with likelihood of correctness, then the model is said to be well-calibrated. In this case, for example, high-confidence outputs could be safely accepted, and low-confidence outputs rejected. Calibration has so far been studied in non-generative (e.g., classification) settings, especially in Software Engineering. However, generated code can quite often be wrong: Developers need to know when they should e.g., directly use, use after careful review, or discard model-generated code; thus Calibration is vital in generative settings. However, the notion of correctness of generated code is non-trivial, and thus so is Calibration. In this paper we make several contributions. We develop a framework for evaluating the Calibration of code-generating models. We consider several tasks, correctness criteria, datasets, and approaches, and find that by and large generative code models are not well-calibrated out of the box. We then show how Calibration can be improved, using standard methods such as Platt scaling. Our contributions will lead to better-calibrated decision-making in the current use of code generated by language models, and offers a framework for future research to further improve calibration methods for generative models in Software Engineering.

Via

Access Paper or Ask Questions

A Study of Variable-Role-based Feature Enrichment in Neural Models of Code

Mar 12, 2023

Aftab Hussain, Md Rafiqul Islam Rabin, Bowen Xu, David Lo, Mohammad Amin Alipour

Abstract:Although deep neural models substantially reduce the overhead of feature engineering, the features readily available in the inputs might significantly impact training cost and the performance of the models. In this paper, we explore the impact of an unsuperivsed feature enrichment approach based on variable roles on the performance of neural models of code. The notion of variable roles (as introduced in the works of Sajaniemi et al. [Refs. 1,2]) has been found to help students' abilities in programming. In this paper, we investigate if this notion would improve the performance of neural models of code. To the best of our knowledge, this is the first work to investigate how Sajaniemi et al.'s concept of variable roles can affect neural models of code. In particular, we enrich a source code dataset by adding the role of individual variables in the dataset programs, and thereby conduct a study on the impact of variable role enrichment in training the Code2Seq model. In addition, we shed light on some challenges and opportunities in feature enrichment for neural code intelligence models.

* Accepted in the 1st International Workshop on Interpretability and Robustness in Neural Software Engineering (InteNSE'23), Co-located with ICSE

Via

Access Paper or Ask Questions

Study of Distractors in Neural Models of Code

Mar 03, 2023

Md Rafiqul Islam Rabin, Aftab Hussain, Sahil Suneja, Mohammad Amin Alipour

Figure 1 for Study of Distractors in Neural Models of Code

Figure 2 for Study of Distractors in Neural Models of Code

Figure 3 for Study of Distractors in Neural Models of Code

Figure 4 for Study of Distractors in Neural Models of Code

Abstract:Finding important features that contribute to the prediction of neural models is an active area of research in explainable AI. Neural models are opaque and finding such features sheds light on a better understanding of their predictions. In contrast, in this work, we present an inverse perspective of distractor features: features that cast doubt about the prediction by affecting the model's confidence in its prediction. Understanding distractors provide a complementary view of the features' relevance in the predictions of neural models. In this paper, we apply a reduction-based technique to find distractors and provide our preliminary results of their impacts and types. Our experiments across various tasks, models, and datasets of code reveal that the removal of tokens can have a significant impact on the confidence of models in their predictions and the categories of tokens can also play a vital role in the model's confidence. Our study aims to enhance the transparency of models by emphasizing those tokens that significantly influence the confidence of the models.

* The 1st International Workshop on Interpretability and Robustness in Neural Software Engineering, Co-located with ICSE (InteNSE'23)

Via

Access Paper or Ask Questions

Syntax-Guided Program Reduction for Understanding Neural Code Intelligence Models

May 28, 2022

Md Rafiqul Islam Rabin, Aftab Hussain, Mohammad Amin Alipour

Figure 1 for Syntax-Guided Program Reduction for Understanding Neural Code Intelligence Models

Figure 2 for Syntax-Guided Program Reduction for Understanding Neural Code Intelligence Models

Figure 3 for Syntax-Guided Program Reduction for Understanding Neural Code Intelligence Models

Figure 4 for Syntax-Guided Program Reduction for Understanding Neural Code Intelligence Models

Abstract:Neural code intelligence (CI) models are opaque black-boxes and offer little insight on the features they use in making predictions. This opacity may lead to distrust in their prediction and hamper their wider adoption in safety-critical applications. Recently, input program reduction techniques have been proposed to identify key features in the input programs to improve the transparency of CI models. However, this approach is syntax-unaware and does not consider the grammar of the programming language. In this paper, we apply a syntax-guided program reduction technique that considers the grammar of the input programs during reduction. Our experiments on multiple models across different types of input programs show that the syntax-guided program reduction technique is faster and provides smaller sets of key tokens in reduced programs. We also show that the key tokens could be used in generating adversarial examples for up to 65% of the input programs.

* The 6th ACM SIGPLAN International Symposium on Machine Programming (MAPS'22). extension of arXiv:2202.06474

Via

Access Paper or Ask Questions

Extracting Label-specific Key Input Features for Neural Code Intelligence Models

Feb 14, 2022

Md Rafiqul Islam Rabin

Figure 1 for Extracting Label-specific Key Input Features for Neural Code Intelligence Models

Figure 2 for Extracting Label-specific Key Input Features for Neural Code Intelligence Models

Figure 3 for Extracting Label-specific Key Input Features for Neural Code Intelligence Models

Figure 4 for Extracting Label-specific Key Input Features for Neural Code Intelligence Models

Abstract:The code intelligence (CI) models are often black-box and do not offer any insights on the input features that they learn for making correct predictions. This opacity may lead to distrust in their prediction and hamper their wider adoption in safety-critical applications. In recent, the program reduction technique is widely being used to identify key input features in order to explain the prediction of CI models. The approach removes irrelevant parts from an input program and keeps the minimal snippets that a CI model needs to maintain its prediction. However, the state-of-the-art approaches mainly use a syntax-unaware program reduction technique that does not follow the syntax of programs, which adds significant overhead to the reduction of input programs and explainability of models. In this paper, we apply a syntax-guided program reduction technique that follows the syntax of input programs during reduction. Our experiments on multiple models across different types of input programs show that the syntax-guided program reduction technique significantly outperforms the syntax-unaware program reduction technique in reducing the size of input programs. Extracting key input features from reduced programs reveals that the syntax-guided reduced programs contain more label-specific key input features and are more vulnerable to adversarial transformation when renaming the key tokens in programs. These label-specific key input features may help to understand the reasoning of models' prediction from different perspectives and increase the trustworthiness to correct classification given by CI models.

* Research Quest 2021, Research Methods in Computer Science, University of Houston (RQ'21)

Via

Access Paper or Ask Questions

Encoding Program as Image: Evaluating Visual Representation of Source Code

Nov 01, 2021

Md Rafiqul Islam Rabin, Mohammad Amin Alipour

Figure 1 for Encoding Program as Image: Evaluating Visual Representation of Source Code

Figure 2 for Encoding Program as Image: Evaluating Visual Representation of Source Code

Figure 3 for Encoding Program as Image: Evaluating Visual Representation of Source Code

Abstract:There are several approaches to encode source code in the input vectors of neural models. These approaches attempt to include various syntactic and semantic features of input programs in their encoding. In this paper, we investigate Code2Snapshot, a novel representation of the source code that is based on the snapshots of input programs. We evaluate several variations of this representation and compare its performance with state-of-the-art representations that utilize the rich syntactic and semantic features of input programs. Our preliminary study on the utility of Code2Snapshot in the code summarization task suggests that simple snapshots of input programs have comparable performance to the state-of-the-art representations. Interestingly, obscuring the input programs have insignificant impacts on the Code2Snapshot performance, suggesting that, for some tasks, neural models may provide high performance by relying merely on the structure of input programs.

* 8 pages, 2 figures, 1 table

Via

Access Paper or Ask Questions