Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Nicholas Donald Lane

Worldwide Federated Training of Language Models

May 23, 2024

Alex Iacob, Lorenzo Sani, Bill Marino, Preslav Aleksandrov, Nicholas Donald Lane

Figure 1 for Worldwide Federated Training of Language Models

Figure 2 for Worldwide Federated Training of Language Models

Figure 3 for Worldwide Federated Training of Language Models

Figure 4 for Worldwide Federated Training of Language Models

Abstract:The reliance of language model training on massive amounts of computation and vast datasets scraped from potentially low-quality, copyrighted, or sensitive data has come into question practically, legally, and ethically. Federated learning provides a plausible alternative by enabling previously untapped data to be voluntarily gathered from collaborating organizations. However, when scaled globally, federated learning requires collaboration across heterogeneous legal, security, and privacy regimes while accounting for the inherent locality of language data; this further exacerbates the established challenge of federated statistical heterogeneity. We propose a Worldwide Federated Language Model Training~(WorldLM) system based on federations of federations, where each federation has the autonomy to account for factors such as its industry, operating jurisdiction, or competitive environment. WorldLM enables such autonomy in the presence of statistical heterogeneity via partial model localization by allowing sub-federations to attentively aggregate key layers from their constituents. Furthermore, it can adaptively share information across federations via residual layer embeddings. Evaluations of language modeling on naturally heterogeneous datasets show that WorldLM outperforms standard federations by up to $1.91\times$, approaches the personalized performance of fully local models, and maintains these advantages under privacy-enhancing techniques.

* 19 pages, 8 figures, Under Review

Via

Access Paper or Ask Questions

Attacks on Third-Party APIs of Large Language Models

Apr 24, 2024

Wanru Zhao, Vidit Khazanchi, Haodi Xing, Xuanli He, Qiongkai Xu, Nicholas Donald Lane

Abstract:Large language model (LLM) services have recently begun offering a plugin ecosystem to interact with third-party API services. This innovation enhances the capabilities of LLMs, but it also introduces risks, as these plugins developed by various third parties cannot be easily trusted. This paper proposes a new attacking framework to examine security and safety vulnerabilities within LLM platforms that incorporate third-party services. Applying our framework specifically to widely used LLMs, we identify real-world malicious attacks across various domains on third-party APIs that can imperceptibly modify LLM outputs. The paper discusses the unique challenges posed by third-party API integration and offers strategic possibilities to improve the security and safety of LLM ecosystems moving forward. Our code is released at https://github.com/vk0812/Third-Party-Attacks-on-LLMs.

* ICLR 2024 Workshop on Secure and Trustworthy Large Language Models

Via

Access Paper or Ask Questions

Enhancing Data Quality in Federated Fine-Tuning of Foundation Models

Mar 07, 2024

Wanru Zhao, Yaxin Du, Nicholas Donald Lane, Siheng Chen, Yanfeng Wang

Abstract:In the current landscape of foundation model training, there is a significant reliance on public domain data, which is nearing exhaustion according to recent research. To further scale up, it is crucial to incorporate collaboration among multiple specialized and high-quality private domain data sources. However, the challenge of training models locally without sharing private data presents numerous obstacles in data quality control. To tackle this issue, we propose a data quality control pipeline for federated fine-tuning of foundation models. This pipeline computes scores reflecting the quality of training data and determines a global threshold for a unified standard, aiming for improved global performance. Our experiments show that the proposed quality control pipeline facilitates the effectiveness and reliability of the model training, leading to better performance.

* Accepted at ICLR 2024 Workshop on Navigating and Addressing Data Problems for Foundation Models (DPFM)

Via

Access Paper or Ask Questions

BLOX: Macro Neural Architecture Search Benchmark and Algorithms

Oct 13, 2022

Thomas Chun Pong Chau, Łukasz Dudziak, Hongkai Wen, Nicholas Donald Lane, Mohamed S Abdelfattah

Figure 1 for BLOX: Macro Neural Architecture Search Benchmark and Algorithms

Figure 2 for BLOX: Macro Neural Architecture Search Benchmark and Algorithms

Figure 3 for BLOX: Macro Neural Architecture Search Benchmark and Algorithms

Figure 4 for BLOX: Macro Neural Architecture Search Benchmark and Algorithms

Abstract:Neural architecture search (NAS) has been successfully used to design numerous high-performance neural networks. However, NAS is typically compute-intensive, so most existing approaches restrict the search to decide the operations and topological structure of a single block only, then the same block is stacked repeatedly to form an end-to-end model. Although such an approach reduces the size of search space, recent studies show that a macro search space, which allows blocks in a model to be different, can lead to better performance. To provide a systematic study of the performance of NAS algorithms on a macro search space, we release Blox - a benchmark that consists of 91k unique models trained on the CIFAR-100 dataset. The dataset also includes runtime measurements of all the models on a diverse set of hardware platforms. We perform extensive experiments to compare existing algorithms that are well studied on cell-based search spaces, with the emerging blockwise approaches that aim to make NAS scalable to much larger macro search spaces. The benchmark and code are available at https://github.com/SamsungLabs/blox.

* Published in the Proceedings of the 36th Conference on Neural Information Processing Systems (NeurIPS 2022) Track on Datasets and Benchmarks

Via

Access Paper or Ask Questions

ZeroFL: Efficient On-Device Training for Federated Learning with Local Sparsity

Aug 04, 2022

Xinchi Qiu, Javier Fernandez-Marques, Pedro PB Gusmao, Yan Gao, Titouan Parcollet, Nicholas Donald Lane

Figure 1 for ZeroFL: Efficient On-Device Training for Federated Learning with Local Sparsity

Figure 2 for ZeroFL: Efficient On-Device Training for Federated Learning with Local Sparsity

Figure 3 for ZeroFL: Efficient On-Device Training for Federated Learning with Local Sparsity

Figure 4 for ZeroFL: Efficient On-Device Training for Federated Learning with Local Sparsity

Abstract:When the available hardware cannot meet the memory and compute requirements to efficiently train high performing machine learning models, a compromise in either the training quality or the model complexity is needed. In Federated Learning (FL), nodes are orders of magnitude more constrained than traditional server-grade hardware and are often battery powered, severely limiting the sophistication of models that can be trained under this paradigm. While most research has focused on designing better aggregation strategies to improve convergence rates and in alleviating the communication costs of FL, fewer efforts have been devoted to accelerating on-device training. Such stage, which repeats hundreds of times (i.e. every round) and can involve thousands of devices, accounts for the majority of the time required to train federated models and, the totality of the energy consumption at the client side. In this work, we present the first study on the unique aspects that arise when introducing sparsity at training time in FL workloads. We then propose ZeroFL, a framework that relies on highly sparse operations to accelerate on-device training. Models trained with ZeroFL and 95% sparsity achieve up to 2.3% higher accuracy compared to competitive baselines obtained from adapting a state-of-the-art sparse training framework to the FL setting.

* International Conference on Learning Representations, 2022
* Published as a conference paper at ICLR 2022

Via

Access Paper or Ask Questions

Prospect Pruning: Finding Trainable Weights at Initialization using Meta-Gradients

Feb 16, 2022

Milad Alizadeh, Shyam A. Tailor, Luisa M Zintgraf, Joost van Amersfoort, Sebastian Farquhar, Nicholas Donald Lane, Yarin Gal

Figure 1 for Prospect Pruning: Finding Trainable Weights at Initialization using Meta-Gradients

Figure 2 for Prospect Pruning: Finding Trainable Weights at Initialization using Meta-Gradients

Figure 3 for Prospect Pruning: Finding Trainable Weights at Initialization using Meta-Gradients

Figure 4 for Prospect Pruning: Finding Trainable Weights at Initialization using Meta-Gradients

Abstract:Pruning neural networks at initialization would enable us to find sparse models that retain the accuracy of the original network while consuming fewer computational resources for training and inference. However, current methods are insufficient to enable this optimization and lead to a large degradation in model performance. In this paper, we identify a fundamental limitation in the formulation of current methods, namely that their saliency criteria look at a single step at the start of training without taking into account the trainability of the network. While pruning iteratively and gradually has been shown to improve pruning performance, explicit consideration of the training stage that will immediately follow pruning has so far been absent from the computation of the saliency criterion. To overcome the short-sightedness of existing methods, we propose Prospect Pruning (ProsPr), which uses meta-gradients through the first few steps of optimization to determine which weights to prune. ProsPr combines an estimate of the higher-order effects of pruning on the loss and the optimization trajectory to identify the trainable sub-network. Our method achieves state-of-the-art pruning performance on a variety of vision classification tasks, with less data and in a single shot compared to existing pruning-at-initialization methods.

Via

Access Paper or Ask Questions