Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Xiaofei Xie

Nanyang Technological University, Singapore

Large Language Model Supply Chain: Open Problems From the Security Perspective

Nov 03, 2024

Qiang Hu, Xiaofei Xie, Sen Chen, Lei Ma

Abstract:Large Language Model (LLM) is changing the software development paradigm and has gained huge attention from both academia and industry. Researchers and developers collaboratively explore how to leverage the powerful problem-solving ability of LLMs for specific domain tasks. Due to the wide usage of LLM-based applications, e.g., ChatGPT, multiple works have been proposed to ensure the security of LLM systems. However, a comprehensive understanding of the entire processes of LLM system construction (the LLM supply chain) is crucial but relevant works are limited. More importantly, the security issues hidden in the LLM SC which could highly impact the reliable usage of LLMs are lack of exploration. Existing works mainly focus on assuring the quality of LLM from the model level, security assurance for the entire LLM SC is ignored. In this work, we take the first step to discuss the potential security risks in each component as well as the integration between components of LLM SC. We summarize 12 security-related risks and provide promising guidance to help build safer LLM systems. We hope our work can facilitate the evolution of artificial general intelligence with secure LLM ecosystems.

Via

Access Paper or Ask Questions

LLM-based Abstraction and Concretization for GUI Test Migration

Sep 08, 2024

Yakun Zhang, Chen Liu, Xiaofei Xie, Yun Lin, Jin Song Dong, Dan Hao, Lu Zhang

Figure 1 for LLM-based Abstraction and Concretization for GUI Test Migration

Figure 2 for LLM-based Abstraction and Concretization for GUI Test Migration

Figure 3 for LLM-based Abstraction and Concretization for GUI Test Migration

Figure 4 for LLM-based Abstraction and Concretization for GUI Test Migration

Abstract:GUI test migration aims to produce test cases with events and assertions to test specific functionalities of a target app. Existing migration approaches typically focus on the widget-mapping paradigm that maps widgets from source apps to target apps. However, since different apps may implement the same functionality in different ways, direct mapping may result in incomplete or buggy test cases, thus significantly impacting the effectiveness of testing target functionality and the practical applicability. In this paper, we propose a new migration paradigm (i.e., abstraction-concretization paradigm) that first abstracts the test logic for the target functionality and then utilizes this logic to generate the concrete GUI test case. Furthermore, we introduce MACdroid, the first approach that migrates GUI test cases based on this paradigm. Specifically, we propose an abstraction technique that utilizes source test cases from source apps targeting the same functionality to extract a general test logic for that functionality. Then, we propose a concretization technique that utilizes the general test logic to guide an LLM in generating the corresponding GUI test case (including events and assertions) for the target app. We evaluate MACdroid on two widely-used datasets (including 31 apps, 34 functionalities, and 123 test cases). On the FrUITeR dataset, the test cases generated by MACdroid successfully test 64% of the target functionalities, improving the baselines by 191%. On the Lin dataset, MACdroid successfully tests 75% of the target functionalities, outperforming the baselines by 42%. These results underscore the effectiveness of MACdroid in GUI test migration.

Via

Access Paper or Ask Questions

MORTAR: A Model-based Runtime Action Repair Framework for AI-enabled Cyber-Physical Systems

Aug 07, 2024

Renzhi Wang, Zhehua Zhou, Jiayang Song, Xuan Xie, Xiaofei Xie, Lei Ma

Figure 1 for MORTAR: A Model-based Runtime Action Repair Framework for AI-enabled Cyber-Physical Systems

Figure 2 for MORTAR: A Model-based Runtime Action Repair Framework for AI-enabled Cyber-Physical Systems

Figure 3 for MORTAR: A Model-based Runtime Action Repair Framework for AI-enabled Cyber-Physical Systems

Figure 4 for MORTAR: A Model-based Runtime Action Repair Framework for AI-enabled Cyber-Physical Systems

Abstract:Cyber-Physical Systems (CPSs) are increasingly prevalent across various industrial and daily-life domains, with applications ranging from robotic operations to autonomous driving. With recent advancements in artificial intelligence (AI), learning-based components, especially AI controllers, have become essential in enhancing the functionality and efficiency of CPSs. However, the lack of interpretability in these AI controllers presents challenges to the safety and quality assurance of AI-enabled CPSs (AI-CPSs). Existing methods for improving the safety of AI controllers often involve neural network repair, which requires retraining with additional adversarial examples or access to detailed internal information of the neural network. Hence, these approaches have limited applicability for black-box policies, where only the inputs and outputs are accessible during operation. To overcome this, we propose MORTAR, a runtime action repair framework designed for AI-CPSs in this work. MORTAR begins by constructing a prediction model that forecasts the quality of actions proposed by the AI controller. If an unsafe action is detected, MORTAR then initiates a repair process to correct it. The generation of repaired actions is achieved through an optimization process guided by the safety estimates from the prediction model. We evaluate the effectiveness of MORTAR across various CPS tasks and AI controllers. The results demonstrate that MORTAR can efficiently improve task completion rates of AI controllers under specified safety specifications. Meanwhile, it also maintains minimal computational overhead, ensuring real-time operation of the AI-CPSs.

Via

Access Paper or Ask Questions

MAVIN: Multi-Action Video Generation with Diffusion Models via Transition Video Infilling

May 28, 2024

Bowen Zhang, Xiaofei Xie, Haotian Lu, Na Ma, Tianlin Li, Qing Guo

Figure 1 for MAVIN: Multi-Action Video Generation with Diffusion Models via Transition Video Infilling

Figure 2 for MAVIN: Multi-Action Video Generation with Diffusion Models via Transition Video Infilling

Figure 3 for MAVIN: Multi-Action Video Generation with Diffusion Models via Transition Video Infilling

Figure 4 for MAVIN: Multi-Action Video Generation with Diffusion Models via Transition Video Infilling

Abstract:Diffusion-based video generation has achieved significant progress, yet generating multiple actions that occur sequentially remains a formidable task. Directly generating a video with sequential actions can be extremely challenging due to the scarcity of fine-grained action annotations and the difficulty in establishing temporal semantic correspondences and maintaining long-term consistency. To tackle this, we propose an intuitive and straightforward solution: splicing multiple single-action video segments sequentially. The core challenge lies in generating smooth and natural transitions between these segments given the inherent complexity and variability of action transitions. We introduce MAVIN (Multi-Action Video INfilling model), designed to generate transition videos that seamlessly connect two given videos, forming a cohesive integrated sequence. MAVIN incorporates several innovative techniques to address challenges in the transition video infilling task. Firstly, a consecutive noising strategy coupled with variable-length sampling is employed to handle large infilling gaps and varied generation lengths. Secondly, boundary frame guidance (BFG) is proposed to address the lack of semantic guidance during transition generation. Lastly, a Gaussian filter mixer (GFM) dynamically manages noise initialization during inference, mitigating train-test discrepancy while preserving generation flexibility. Additionally, we introduce a new metric, CLIP-RS (CLIP Relative Smoothness), to evaluate temporal coherence and smoothness, complementing traditional quality-based metrics. Experimental results on horse and tiger scenarios demonstrate MAVIN's superior performance in generating smooth and coherent video transitions compared to existing methods.

Via

Access Paper or Ask Questions

KoReA-SFL: Knowledge Replay-based Split Federated Learning Against Catastrophic Forgetting

Apr 19, 2024

Zeke Xia, Ming Hu, Dengke Yan, Ruixuan Liu, Anran Li, Xiaofei Xie, Mingsong Chen

Figure 1 for KoReA-SFL: Knowledge Replay-based Split Federated Learning Against Catastrophic Forgetting

Figure 2 for KoReA-SFL: Knowledge Replay-based Split Federated Learning Against Catastrophic Forgetting

Figure 3 for KoReA-SFL: Knowledge Replay-based Split Federated Learning Against Catastrophic Forgetting

Figure 4 for KoReA-SFL: Knowledge Replay-based Split Federated Learning Against Catastrophic Forgetting

Abstract:Although Split Federated Learning (SFL) is good at enabling knowledge sharing among resource-constrained clients, it suffers from the problem of low training accuracy due to the neglect of data heterogeneity and catastrophic forgetting. To address this issue, we propose a novel SFL approach named KoReA-SFL, which adopts a multi-model aggregation mechanism to alleviate gradient divergence caused by heterogeneous data and a knowledge replay strategy to deal with catastrophic forgetting. Specifically, in KoReA-SFL cloud servers (i.e., fed server and main server) maintain multiple branch model portions rather than a global portion for local training and an aggregated master-model portion for knowledge sharing among branch portions. To avoid catastrophic forgetting, the main server of KoReA-SFL selects multiple assistant devices for knowledge replay according to the training data distribution of each server-side branch-model portion. Experimental results obtained from non-IID and IID scenarios demonstrate that KoReA-SFL significantly outperforms conventional SFL methods (by up to 23.25\% test accuracy improvement).

Via

Access Paper or Ask Questions

CaBaFL: Asynchronous Federated Learning via Hierarchical Cache and Feature Balance

Apr 19, 2024

Zeke Xia, Ming Hu, Dengke Yan, Xiaofei Xie, Tianlin Li, Anran Li, Junlong Zhou, Mingsong Chen

Figure 1 for CaBaFL: Asynchronous Federated Learning via Hierarchical Cache and Feature Balance

Figure 2 for CaBaFL: Asynchronous Federated Learning via Hierarchical Cache and Feature Balance

Figure 3 for CaBaFL: Asynchronous Federated Learning via Hierarchical Cache and Feature Balance

Figure 4 for CaBaFL: Asynchronous Federated Learning via Hierarchical Cache and Feature Balance

Abstract:Federated Learning (FL) as a promising distributed machine learning paradigm has been widely adopted in Artificial Intelligence of Things (AIoT) applications. However, the efficiency and inference capability of FL is seriously limited due to the presence of stragglers and data imbalance across massive AIoT devices, respectively. To address the above challenges, we present a novel asynchronous FL approach named CaBaFL, which includes a hierarchical Cache-based aggregation mechanism and a feature Balance-guided device selection strategy. CaBaFL maintains multiple intermediate models simultaneously for local training. The hierarchical cache-based aggregation mechanism enables each intermediate model to be trained on multiple devices to align the training time and mitigate the straggler issue. In specific, each intermediate model is stored in a low-level cache for local training and when it is trained by sufficient local devices, it will be stored in a high-level cache for aggregation. To address the problem of imbalanced data, the feature balance-guided device selection strategy in CaBaFL adopts the activation distribution as a metric, which enables each intermediate model to be trained across devices with totally balanced data distributions before aggregation. Experimental results show that compared with the state-of-the-art FL methods, CaBaFL achieves up to 9.26X training acceleration and 19.71\% accuracy improvements.

Via

Access Paper or Ask Questions

Enhancing Fault Detection for Large Language Models via Mutation-Based Confidence Smoothing

Apr 14, 2024

Qiang Hu, Jin Wen, Maxime Cordy, Yuheng Huang, Xiaofei Xie, Lei Ma

Figure 1 for Enhancing Fault Detection for Large Language Models via Mutation-Based Confidence Smoothing

Figure 2 for Enhancing Fault Detection for Large Language Models via Mutation-Based Confidence Smoothing

Figure 3 for Enhancing Fault Detection for Large Language Models via Mutation-Based Confidence Smoothing

Figure 4 for Enhancing Fault Detection for Large Language Models via Mutation-Based Confidence Smoothing

Abstract:Large language models (LLMs) achieved great success in multiple application domains and attracted huge attention from different research communities recently. Unfortunately, even for the best LLM, there still exist many faults that LLM cannot correctly predict. Such faults will harm the usability of LLMs. How to quickly reveal them in LLMs is important, but challenging. The reasons are twofold, 1) the heavy labeling effort for preparing the test data, and 2) accessing closed-source LLMs such as GPT4 is money-required. To handle this problem, in the traditional deep learning testing field, test selection methods have been proposed for efficiently testing deep learning models by prioritizing faults. However, the usefulness of these methods on LLMs is unclear and under exploration. In this paper, we first study the effectiveness of existing fault detection methods for LLMs. Experimental results on four different tasks~(including both code tasks and natural language processing tasks) and four LLMs (e.g., LLaMA and GPT4) demonstrated that existing fault detection methods cannot perform well on LLMs (e.g., seven out of eight methods perform worse than random selection on LLaMA). To enhance existing fault detection methods, we propose MuCS, a prompt Mutation-based prediction Confidence Smoothing method for LLMs. Concretely, we mutate the prompts and compute the average prediction confidence of all mutants as the input of fault detection methods. The results show that our proposed solution significantly enhances existing methods with the improvement of test relative coverage by up to 97.64%.

Via

Access Paper or Ask Questions

Evaluating Decision Optimality of Autonomous Driving via Metamorphic Testing

Feb 28, 2024

Mingfei Cheng, Yuan Zhou, Xiaofei Xie, Junjie Wang, Guozhu Meng, Kairui Yang

Figure 1 for Evaluating Decision Optimality of Autonomous Driving via Metamorphic Testing

Figure 2 for Evaluating Decision Optimality of Autonomous Driving via Metamorphic Testing

Figure 3 for Evaluating Decision Optimality of Autonomous Driving via Metamorphic Testing

Figure 4 for Evaluating Decision Optimality of Autonomous Driving via Metamorphic Testing

Abstract:Autonomous Driving System (ADS) testing is crucial in ADS development, with the current primary focus being on safety. However, the evaluation of non-safety-critical performance, particularly the ADS's ability to make optimal decisions and produce optimal paths for autonomous vehicles (AVs), is equally vital to ensure the intelligence and reduce risks of AVs. Currently, there is little work dedicated to assessing ADSs' optimal decision-making performance due to the lack of corresponding oracles and the difficulty in generating scenarios with non-optimal decisions. In this paper, we focus on evaluating the decision-making quality of an ADS and propose the first method for detecting non-optimal decision scenarios (NoDSs), where the ADS does not compute optimal paths for AVs. Firstly, to deal with the oracle problem, we propose a novel metamorphic relation (MR) aimed at exposing violations of optimal decisions. The MR identifies the property that the ADS should retain optimal decisions when the optimal path remains unaffected by non-invasive changes. Subsequently, we develop a new framework, Decictor, designed to generate NoDSs efficiently. Decictor comprises three main components: Non-invasive Mutation, MR Check, and Feedback. The Non-invasive Mutation ensures that the original optimal path in the mutated scenarios is not affected, while the MR Check is responsible for determining whether non-optimal decisions are made. To enhance the effectiveness of identifying NoDSs, we design a feedback metric that combines both spatial and temporal aspects of the AV's movement. We evaluate Decictor on Baidu Apollo, an open-source and production-grade ADS. The experimental results validate the effectiveness of Decictor in detecting non-optimal decisions of ADSs. Our work provides valuable and original insights into evaluating the non-safety-critical performance of ADSs.

Via

Access Paper or Ask Questions

Importance Guided Data Augmentation for Neural-Based Code Understanding

Feb 24, 2024

Zeming Dong, Qiang Hu, Xiaofei Xie, Maxime Cordy, Mike Papadakis, Jianjun Zhao

Abstract:Pre-trained code models lead the era of code intelligence. Many models have been designed with impressive performance recently. However, one important problem, data augmentation for code data that automatically helps developers prepare training data lacks study in the field of code learning. In this paper, we introduce a general data augmentation framework, GenCode, to enhance the training of code understanding models. GenCode follows a generation-and-selection paradigm to prepare useful training codes. Specifically, it uses code transformation techniques to generate new code candidates first and then selects important ones as the training data by importance metrics. To evaluate the effectiveness of GenCode with a general importance metric -- loss value, we conduct experiments on four code understanding tasks (e.g., code clone detection) and three pre-trained code models (e.g., CodeT5). Compared to the state-of-the-art (SOTA) code augmentation method, MixCode, GenCode produces code models with 2.92% higher accuracy and 4.90% robustness on average.

Via

Access Paper or Ask Questions

Have Your Cake and Eat It Too: Toward Efficient and Accurate Split Federated Learning

Nov 22, 2023

Dengke Yan, Ming Hu, Zeke Xia, Yanxin Yang, Jun Xia, Xiaofei Xie, Mingsong Chen

Figure 1 for Have Your Cake and Eat It Too: Toward Efficient and Accurate Split Federated Learning

Figure 2 for Have Your Cake and Eat It Too: Toward Efficient and Accurate Split Federated Learning

Figure 3 for Have Your Cake and Eat It Too: Toward Efficient and Accurate Split Federated Learning

Figure 4 for Have Your Cake and Eat It Too: Toward Efficient and Accurate Split Federated Learning

Abstract:Due to its advantages in resource constraint scenarios, Split Federated Learning (SFL) is promising in AIoT systems. However, due to data heterogeneity and stragglers, SFL suffers from the challenges of low inference accuracy and low efficiency. To address these issues, this paper presents a novel SFL approach, named Sliding Split Federated Learning (S$^2$FL), which adopts an adaptive sliding model split strategy and a data balance-based training mechanism. By dynamically dispatching different model portions to AIoT devices according to their computing capability, S$^2$FL can alleviate the low training efficiency caused by stragglers. By combining features uploaded by devices with different data distributions to generate multiple larger batches with a uniform distribution for back-propagation, S$^2$FL can alleviate the performance degradation caused by data heterogeneity. Experimental results demonstrate that, compared to conventional SFL, S$^2$FL can achieve up to 16.5\% inference accuracy improvement and 3.54X training acceleration.

Via

Access Paper or Ask Questions