Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Pin-Yu Chen

Navigating the Safety Landscape: Measuring Risks in Finetuning Large Language Models

May 28, 2024

ShengYun Peng, Pin-Yu Chen, Matthew Hull, Duen Horng Chau

Abstract:Safety alignment is the key to guiding the behaviors of large language models (LLMs) that are in line with human preferences and restrict harmful behaviors at inference time, but recent studies show that it can be easily compromised by finetuning with only a few adversarially designed training examples. We aim to measure the risks in finetuning LLMs through navigating the LLM safety landscape. We discover a new phenomenon observed universally in the model parameter space of popular open-source LLMs, termed as "safety basin": randomly perturbing model weights maintains the safety level of the original aligned model in its local neighborhood. Our discovery inspires us to propose the new VISAGE safety metric that measures the safety in LLM finetuning by probing its safety landscape. Visualizing the safety landscape of the aligned model enables us to understand how finetuning compromises safety by dragging the model away from the safety basin. LLM safety landscape also highlights the system prompt's critical role in protecting a model, and that such protection transfers to its perturbed variants within the safety basin. These observations from our safety landscape research provide new insights for future work on LLM safety community.

Via

Access Paper or Ask Questions

A Provably Effective Method for Pruning Experts in Fine-tuned Sparse Mixture-of-Experts

May 28, 2024

Mohammed Nowaz Rabbani Chowdhury, Meng Wang, Kaoutar El Maghraoui, Naigang Wang, Pin-Yu Chen, Christopher Carothers

Figure 1 for A Provably Effective Method for Pruning Experts in Fine-tuned Sparse Mixture-of-Experts

Figure 2 for A Provably Effective Method for Pruning Experts in Fine-tuned Sparse Mixture-of-Experts

Figure 3 for A Provably Effective Method for Pruning Experts in Fine-tuned Sparse Mixture-of-Experts

Figure 4 for A Provably Effective Method for Pruning Experts in Fine-tuned Sparse Mixture-of-Experts

Abstract:The sparsely gated mixture of experts (MoE) architecture sends different inputs to different subnetworks, i.e., experts, through trainable routers. MoE reduces the training computation significantly for large models, but its deployment can be still memory or computation expensive for some downstream tasks. Model pruning is a popular approach to reduce inference computation, but its application in MoE architecture is largely unexplored. To the best of our knowledge, this paper provides the first provably efficient technique for pruning experts in finetuned MoE models. We theoretically prove that prioritizing the pruning of the experts with a smaller change of the routers l2 norm from the pretrained model guarantees the preservation of test accuracy, while significantly reducing the model size and the computational requirements. Although our theoretical analysis is centered on binary classification tasks on simplified MoE architecture, our expert pruning method is verified on large vision MoE models such as VMoE and E3MoE finetuned on benchmark datasets such as CIFAR10, CIFAR100, and ImageNet.

* The 41st International Conference on Machine Learning, ICML 2024

Via

Access Paper or Ask Questions

Safe LoRA: the Silver Lining of Reducing Safety Risks when Fine-tuning Large Language Models

May 27, 2024

Chia-Yi Hsu, Yu-Lin Tsai, Chih-Hsun Lin, Pin-Yu Chen, Chia-Mu Yu, Chun-Ying Huang

Figure 1 for Safe LoRA: the Silver Lining of Reducing Safety Risks when Fine-tuning Large Language Models

Figure 2 for Safe LoRA: the Silver Lining of Reducing Safety Risks when Fine-tuning Large Language Models

Figure 3 for Safe LoRA: the Silver Lining of Reducing Safety Risks when Fine-tuning Large Language Models

Figure 4 for Safe LoRA: the Silver Lining of Reducing Safety Risks when Fine-tuning Large Language Models

Abstract:While large language models (LLMs) such as Llama-2 or GPT-4 have shown impressive zero-shot performance, fine-tuning is still necessary to enhance their performance for customized datasets, domain-specific tasks, or other private needs. However, fine-tuning all parameters of LLMs requires significant hardware resources, which can be impractical for typical users. Therefore, parameter-efficient fine-tuning such as LoRA have emerged, allowing users to fine-tune LLMs without the need for considerable computing resources, with little performance degradation compared to fine-tuning all parameters. Unfortunately, recent studies indicate that fine-tuning can increase the risk to the safety of LLMs, even when data does not contain malicious content. To address this challenge, we propose Safe LoRA, a simple one-liner patch to the original LoRA implementation by introducing the projection of LoRA weights from selected layers to the safety-aligned subspace, effectively reducing the safety risks in LLM fine-tuning while maintaining utility. It is worth noting that Safe LoRA is a training-free and data-free approach, as it only requires the knowledge of the weights from the base and aligned LLMs. Our extensive experiments demonstrate that when fine-tuning on purely malicious data, Safe LoRA retains similar safety performance as the original aligned model. Moreover, when the fine-tuning dataset contains a mixture of both benign and malicious data, Safe LoRA mitigates the negative effect made by malicious data while preserving performance on downstream tasks.

Via

Access Paper or Ask Questions

SF-DQN: Provable Knowledge Transfer using Successor Feature for Deep Reinforcement Learning

May 24, 2024

Shuai Zhang, Heshan Devaka Fernando, Miao Liu, Keerthiram Murugesan, Songtao Lu, Pin-Yu Chen, Tianyi Chen, Meng Wang

Figure 1 for SF-DQN: Provable Knowledge Transfer using Successor Feature for Deep Reinforcement Learning

Figure 2 for SF-DQN: Provable Knowledge Transfer using Successor Feature for Deep Reinforcement Learning

Figure 3 for SF-DQN: Provable Knowledge Transfer using Successor Feature for Deep Reinforcement Learning

Figure 4 for SF-DQN: Provable Knowledge Transfer using Successor Feature for Deep Reinforcement Learning

Abstract:This paper studies the transfer reinforcement learning (RL) problem where multiple RL problems have different reward functions but share the same underlying transition dynamics. In this setting, the Q-function of each RL problem (task) can be decomposed into a successor feature (SF) and a reward mapping: the former characterizes the transition dynamics, and the latter characterizes the task-specific reward function. This Q-function decomposition, coupled with a policy improvement operator known as generalized policy improvement (GPI), reduces the sample complexity of finding the optimal Q-function, and thus the SF \& GPI framework exhibits promising empirical performance compared to traditional RL methods like Q-learning. However, its theoretical foundations remain largely unestablished, especially when learning the successor features using deep neural networks (SF-DQN). This paper studies the provable knowledge transfer using SFs-DQN in transfer RL problems. We establish the first convergence analysis with provable generalization guarantees for SF-DQN with GPI. The theory reveals that SF-DQN with GPI outperforms conventional RL approaches, such as deep Q-network, in terms of both faster convergence rate and better generalization. Numerical experiments on real and synthetic RL tasks support the superior performance of SF-DQN \& GPI, aligning with our theoretical findings.

* arXiv admin note: text overlap with arXiv:2310.16173

Via

Access Paper or Ask Questions

Self-Taught Recognizer: Toward Unsupervised Adaptation for Speech Foundation Models

May 23, 2024

Yuchen Hu, Chen Chen, Chao-Han Huck Yang, Chengwei Qin, Pin-Yu Chen, Eng Siong Chng, Chao Zhang

Figure 1 for Self-Taught Recognizer: Toward Unsupervised Adaptation for Speech Foundation Models

Figure 2 for Self-Taught Recognizer: Toward Unsupervised Adaptation for Speech Foundation Models

Figure 3 for Self-Taught Recognizer: Toward Unsupervised Adaptation for Speech Foundation Models

Figure 4 for Self-Taught Recognizer: Toward Unsupervised Adaptation for Speech Foundation Models

Abstract:We propose an unsupervised adaptation framework, Self-TAught Recognizer (STAR), which leverages unlabeled data to enhance the robustness of automatic speech recognition (ASR) systems in diverse target domains, such as noise and accents. STAR is developed for prevalent speech foundation models based on Transformer-related architecture with auto-regressive decoding (e.g., Whisper, Canary). Specifically, we propose a novel indicator that empirically integrates step-wise information during decoding to assess the token-level quality of pseudo labels without ground truth, thereby guiding model updates for effective unsupervised adaptation. Experimental results show that STAR achieves an average of 13.5% relative reduction in word error rate across 14 target domains, and it sometimes even approaches the upper-bound performance of supervised adaptation. Surprisingly, we also observe that STAR prevents the adapted model from the common catastrophic forgetting problem without recalling source-domain data. Furthermore, STAR exhibits high data efficiency that only requires less than one-hour unlabeled data, and seamless generality to alternative large speech models and speech translation tasks. Our code aims to open source to the research communities.

* 23 pages, Preprint

Via

Access Paper or Ask Questions

Improving Transformers using Faithful Positional Encoding

May 16, 2024

Tsuyoshi Idé, Jokin Labaien, Pin-Yu Chen

Abstract:We propose a new positional encoding method for a neural network architecture called the Transformer. Unlike the standard sinusoidal positional encoding, our approach is based on solid mathematical grounds and has a guarantee of not losing information about the positional order of the input sequence. We show that the new encoding approach systematically improves the prediction performance in the time-series classification task.

* arXiv admin note: text overlap with arXiv:2305.17149

Via

Access Paper or Ask Questions

Graph is all you need? Lightweight data-agnostic neural architecture search without training

May 02, 2024

Zhenhan Huang, Tejaswini Pedapati, Pin-Yu Chen, Chunhen Jiang, Jianxi Gao

Figure 1 for Graph is all you need? Lightweight data-agnostic neural architecture search without training

Figure 2 for Graph is all you need? Lightweight data-agnostic neural architecture search without training

Figure 3 for Graph is all you need? Lightweight data-agnostic neural architecture search without training

Figure 4 for Graph is all you need? Lightweight data-agnostic neural architecture search without training

Abstract:Neural architecture search (NAS) enables the automatic design of neural network models. However, training the candidates generated by the search algorithm for performance evaluation incurs considerable computational overhead. Our method, dubbed nasgraph, remarkably reduces the computational costs by converting neural architectures to graphs and using the average degree, a graph measure, as the proxy in lieu of the evaluation metric. Our training-free NAS method is data-agnostic and light-weight. It can find the best architecture among 200 randomly sampled architectures from NAS-Bench201 in 217 CPU seconds. Besides, our method is able to achieve competitive performance on various datasets including NASBench-101, NASBench-201, and NDS search spaces. We also demonstrate that nasgraph generalizes to more challenging tasks on Micro TransNAS-Bench-101.

Via

Access Paper or Ask Questions

Steal Now and Attack Later: Evaluating Robustness of Object Detection against Black-box Adversarial Attacks

Apr 24, 2024

Erh-Chung Chen, Pin-Yu Chen, I-Hsin Chung, Che-Rung Lee

Abstract:Latency attacks against object detection represent a variant of adversarial attacks that aim to inflate the inference time by generating additional ghost objects in a target image. However, generating ghost objects in the black-box scenario remains a challenge since information about these unqualified objects remains opaque. In this study, we demonstrate the feasibility of generating ghost objects in adversarial examples by extending the concept of "steal now, decrypt later" attacks. These adversarial examples, once produced, can be employed to exploit potential vulnerabilities in the AI service, giving rise to significant security concerns. The experimental results demonstrate that the proposed attack achieves successful attacks across various commonly used models and Google Vision API without any prior knowledge about the target model. Additionally, the average cost of each attack is less than \$ 1 dollars, posing a significant threat to AI security.

Via

Access Paper or Ask Questions

Model Reprogramming Outperforms Fine-tuning on Out-of-distribution Data in Text-Image Encoders

Mar 29, 2024

Andrew Geng, Pin-Yu Chen

Abstract:When evaluating the performance of a pre-trained model transferred to a downstream task, it is imperative to assess not only the in-distribution (ID) accuracy of the downstream model but also its capacity to generalize and identify out-of-distribution (OOD) samples. In this paper, we unveil the hidden costs associated with intrusive fine-tuning techniques. Specifically, we demonstrate that commonly used fine-tuning methods not only distort the representations necessary for generalizing to covariate-shifted OOD samples (OOD generalization) but also distort the representations necessary for detecting semantically-shifted OOD samples (OOD detection). To address these challenges, we introduce a new model reprogramming approach for fine-tuning, which we name Reprogrammer. Reprogrammer aims to improve the holistic performance of the downstream model across ID, OOD generalization, and OOD detection tasks. Our empirical evidence reveals that Reprogrammer is less intrusive and yields superior downstream models. Furthermore, we demonstrate that by appending an additional representation residual connection to Reprogrammer, we can further preserve pre-training representations, resulting in an even more safe and robust downstream model capable of excelling in many ID classification, OOD generalization, and OOD detection settings.

* Accepted in SatML 2024

Via

Access Paper or Ask Questions

NaNa and MiGu: Semantic Data Augmentation Techniques to Enhance Protein Classification in Graph Neural Networks

Mar 26, 2024

Yi-Shan Lan, Pin-Yu Chen, Tsung-Yi Ho

Abstract:Protein classification tasks are essential in drug discovery. Real-world protein structures are dynamic, which will determine the properties of proteins. However, the existing machine learning methods, like ProNet (Wang et al., 2022a), only access limited conformational characteristics and protein side-chain features, leading to impractical protein structure and inaccuracy of protein classes in their predictions. In this paper, we propose novel semantic data augmentation methods, Novel Augmentation of New Node Attributes (NaNa), and Molecular Interactions and Geometric Upgrading (MiGu) to incorporate backbone chemical and side-chain biophysical information into protein classification tasks and a co-embedding residual learning framework. Specifically, we leverage molecular biophysical, secondary structure, chemical bonds, and ionic features of proteins to facilitate protein classification tasks. Furthermore, our semantic augmentation methods and the co-embedding residual learning framework can improve the performance of GIN (Xu et al., 2019) on EC and Fold datasets (Bairoch, 2000; Andreeva et al., 2007) by 16.41% and 11.33% respectively. Our code is available at https://github.com/r08b46009/Code_for_MIGU_NANA/tree/main.

Via

Access Paper or Ask Questions