Self-supervised learning has emerged as a powerful tool for pretraining deep networks on unlabeled data, prior to transfer learning of target tasks with limited annotation. The relevance between the pretraining pretext and target tasks is crucial to the success of transfer learning. Various pretext tasks have been proposed to utilize properties of medical image data (e.g., three dimensionality), which are more relevant to medical image analysis than generic ones for natural images. However, previous work rarely paid attention to data with anatomy-oriented imaging planes, e.g., standard cardiac magnetic resonance imaging views. As these imaging planes are defined according to the anatomy of the imaged organ, pretext tasks effectively exploiting this information can pretrain the networks to gain knowledge on the organ of interest. In this work, we propose two complementary pretext tasks for this group of medical image data based on the spatial relationship of the imaging planes. The first is to learn the relative orientation between the imaging planes and implemented as regressing their intersecting lines. The second exploits parallel imaging planes to regress their relative slice locations within a stack. Both pretext tasks are conceptually straightforward and easy to implement, and can be combined in multitask learning for better representation learning. Thorough experiments on two anatomical structures (heart and knee) and representative target tasks (semantic segmentation and classification) demonstrate that the proposed pretext tasks are effective in pretraining deep networks for remarkably boosted performance on the target tasks, and superior to other recent approaches.
Mainstream backdoor attack methods typically demand substantial tuning data for poisoning, limiting their practicality and potentially degrading the overall performance when applied to Large Language Models (LLMs). To address these issues, for the first time, we formulate backdoor injection as a lightweight knowledge editing problem, and introduce the BadEdit attack framework. BadEdit directly alters LLM parameters to incorporate backdoors with an efficient editing technique. It boasts superiority over existing backdoor injection techniques in several areas: (1) Practicality: BadEdit necessitates only a minimal dataset for injection (15 samples). (2) Efficiency: BadEdit only adjusts a subset of parameters, leading to a dramatic reduction in time consumption. (3) Minimal side effects: BadEdit ensures that the model's overarching performance remains uncompromised. (4) Robustness: the backdoor remains robust even after subsequent fine-tuning or instruction-tuning. Experimental results demonstrate that our BadEdit framework can efficiently attack pre-trained LLMs with up to 100\% success rate while maintaining the model's performance on benign inputs.
Large Language Models (LLMs) have presented impressive performance across several transformative tasks. However, it is non-trivial to efficiently utilize large-scale cluster resources to develop LLMs, often riddled with numerous challenges such as frequent hardware failures, intricate parallelization strategies, and imbalanced resource utilization. In this paper, we present an in-depth characterization study of a six-month LLM development workload trace collected from our GPU datacenter Acme. Specifically, we investigate discrepancies between LLMs and prior task-specific Deep Learning (DL) workloads, explore resource utilization patterns, and identify the impact of various job failures. Our analysis summarizes hurdles we encountered and uncovers potential opportunities to optimize systems tailored for LLMs. Furthermore, we introduce our system efforts: (1) fault-tolerant pretraining, which enhances fault tolerance through LLM-involved failure diagnosis and automatic recovery. (2) decoupled scheduling for evaluation, which achieves timely performance feedback via trial decomposition and scheduling optimization.
Federated learning (FL) facilitates collaborative training of machine learning models among a large number of clients while safeguarding the privacy of their local datasets. However, FL remains susceptible to vulnerabilities such as privacy inference and inversion attacks. Single-server secure aggregation schemes were proposed to address these threats. Nonetheless, they encounter practical constraints due to their round and communication complexities. This work introduces Fluent, a round and communication-efficient secure aggregation scheme for private FL. Fluent has several improvements compared to state-of-the-art solutions like Bell et al. (CCS 2020) and Ma et al. (SP 2023): (1) it eliminates frequent handshakes and secret sharing operations by efficiently reusing the shares across multiple training iterations without leaking any private information; (2) it accomplishes both the consistency check and gradient unmasking in one logical step, thereby reducing another round of communication. With these innovations, Fluent achieves the fewest communication rounds (i.e., two in the collection phase) in the malicious server setting, in contrast to at least three rounds in existing schemes. This significantly minimizes the latency for geographically distributed clients; (3) Fluent also introduces Fluent-Dynamic with a participant selection algorithm and an alternative secret sharing scheme. This can facilitate dynamic client joining and enhance the system flexibility and scalability. We implemented Fluent and compared it with existing solutions. Experimental results show that Fluent improves the computational cost by at least 75% and communication overhead by at least 25% for normal clients. Fluent also reduces the communication overhead for the server at the expense of a marginal increase in computational cost.
Deep neural networks (DNNs) have revolutionized various industries, leading to the rise of Machine Learning as a Service (MLaaS). In this paradigm, well-trained models are typically deployed through APIs. However, DNNs are susceptible to backdoor attacks, which pose significant risks to their applications. This vulnerability necessitates a method for users to ascertain whether an API is compromised before usage. Although many backdoor detection methods have been developed, they often operate under the assumption that the defender has access to specific information such as details of the attack, soft predictions from the model API, and even the knowledge of the model parameters, limiting their practicality in MLaaS scenarios. To address it, in this paper, we begin by presenting an intriguing observation: the decision boundary of the backdoored model exhibits a greater degree of closeness than that of the clean model. Simultaneously, if only one single label is infected, a larger portion of the regions will be dominated by the attacked label. Building upon this observation, we propose Model X-ray, a novel backdoor detection approach for MLaaS through the analysis of decision boundaries. Model X-ray can not only identify whether the target API is infected by backdoor attacks but also determine the target attacked label under the all-to-one attack strategy. Importantly, it accomplishes this solely by the hard prediction of clean inputs, regardless of any assumptions about attacks and prior knowledge of the training details of the model. Extensive experiments demonstrated that Model X-ray can be effective for MLaaS across diverse backdoor attacks, datasets, and architectures.
With the prevalence of text-to-image generative models, their safety becomes a critical concern. adversarial testing techniques have been developed to probe whether such models can be prompted to produce Not-Safe-For-Work (NSFW) content. However, existing solutions face several challenges, including low success rate and inefficiency. We introduce Groot, the first automated framework leveraging tree-based semantic transformation for adversarial testing of text-to-image models. Groot employs semantic decomposition and sensitive element drowning strategies in conjunction with LLMs to systematically refine adversarial prompts. Our comprehensive evaluation confirms the efficacy of Groot, which not only exceeds the performance of current state-of-the-art approaches but also achieves a remarkable success rate (93.66%) on leading text-to-image models such as DALL-E 3 and Midjourney.
With the development of generative models, the quality of generated content keeps increasing. Recently, open-source models have made it surprisingly easy to manipulate and edit photos and videos, with just a few simple prompts. While these cutting-edge technologies have gained popularity, they have also given rise to concerns regarding the privacy and portrait rights of individuals. Malicious users can exploit these tools for deceptive or illegal purposes. Although some previous works focus on protecting photos against generative models, we find there are still gaps between protecting videos and images in the aspects of efficiency and effectiveness. Therefore, we introduce our protection method, PRIME, to significantly reduce the time cost and improve the protection performance. Moreover, to evaluate our proposed protection method, we consider both objective metrics and human subjective metrics. Our evaluation results indicate that PRIME only costs 8.3% GPU hours of the cost of the previous state-of-the-art method and achieves better protection results on both human evaluation and objective metrics. Code can be found in https://github.com/GuanlinLee/prime.
Pre-trained models (PTMs) are extensively utilized in various downstream tasks. Adopting untrusted PTMs may suffer from backdoor attacks, where the adversary can compromise the downstream models by injecting backdoors into the PTM. However, existing backdoor attacks to PTMs can only achieve partially task-agnostic and the embedded backdoors are easily erased during the fine-tuning process. In this paper, we propose a novel transferable backdoor attack, TransTroj, to simultaneously meet functionality-preserving, durable, and task-agnostic. In particular, we first formalize transferable backdoor attacks as the indistinguishability problem between poisoned and clean samples in the embedding space. We decompose the embedding indistinguishability into pre- and post-indistinguishability, representing the similarity of the poisoned and reference embeddings before and after the attack. Then, we propose a two-stage optimization that separately optimizes triggers and victim PTMs to achieve embedding indistinguishability. We evaluate TransTroj on four PTMs and six downstream tasks. Experimental results show that TransTroj significantly outperforms SOTA task-agnostic backdoor attacks (18%$\sim$99%, 68% on average) and exhibits superior performance under various system settings. The code is available at https://github.com/haowang-cqu/TransTroj .
Pre-training, which utilizes extensive and varied datasets, is a critical factor in the success of Large Language Models (LLMs) across numerous applications. However, the detailed makeup of these datasets is often not disclosed, leading to concerns about data security and potential misuse. This is particularly relevant when copyrighted material, still under legal protection, is used inappropriately, either intentionally or unintentionally, infringing on the rights of the authors. In this paper, we introduce a detailed framework designed to detect and assess the presence of content from potentially copyrighted books within the training datasets of LLMs. This framework also provides a confidence estimation for the likelihood of each content sample's inclusion. To validate our approach, we conduct a series of simulated experiments, the results of which affirm the framework's effectiveness in identifying and addressing instances of content misuse in LLM training processes. Furthermore, we investigate the presence of recognizable quotes from famous literary works within these datasets. The outcomes of our study have significant implications for ensuring the ethical use of copyrighted materials in the development of LLMs, highlighting the need for more transparent and responsible data management practices in this field.