Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Pin-Yu Chen

Defining and Evaluating Physical Safety for Large Language Models

Nov 04, 2024

Yung-Chen Tang, Pin-Yu Chen, Tsung-Yi Ho

Figure 1 for Defining and Evaluating Physical Safety for Large Language Models

Figure 2 for Defining and Evaluating Physical Safety for Large Language Models

Figure 3 for Defining and Evaluating Physical Safety for Large Language Models

Figure 4 for Defining and Evaluating Physical Safety for Large Language Models

Abstract:Large Language Models (LLMs) are increasingly used to control robotic systems such as drones, but their risks of causing physical threats and harm in real-world applications remain unexplored. Our study addresses the critical gap in evaluating LLM physical safety by developing a comprehensive benchmark for drone control. We classify the physical safety risks of drones into four categories: (1) human-targeted threats, (2) object-targeted threats, (3) infrastructure attacks, and (4) regulatory violations. Our evaluation of mainstream LLMs reveals an undesirable trade-off between utility and safety, with models that excel in code generation often performing poorly in crucial safety aspects. Furthermore, while incorporating advanced prompt engineering techniques such as In-Context Learning and Chain-of-Thought can improve safety, these methods still struggle to identify unintentional attacks. In addition, larger models demonstrate better safety capabilities, particularly in refusing dangerous commands. Our findings and benchmark can facilitate the design and evaluation of physical safety for LLMs. The project page is available at huggingface.co/spaces/TrustSafeAI/LLM-physical-safety.

Via

Access Paper or Ask Questions

Attention Tracker: Detecting Prompt Injection Attacks in LLMs

Nov 01, 2024

Kuo-Han Hung, Ching-Yun Ko, Ambrish Rawat, I-Hsin Chung, Winston H. Hsu, Pin-Yu Chen

Figure 1 for Attention Tracker: Detecting Prompt Injection Attacks in LLMs

Figure 2 for Attention Tracker: Detecting Prompt Injection Attacks in LLMs

Figure 3 for Attention Tracker: Detecting Prompt Injection Attacks in LLMs

Figure 4 for Attention Tracker: Detecting Prompt Injection Attacks in LLMs

Abstract:Large Language Models (LLMs) have revolutionized various domains but remain vulnerable to prompt injection attacks, where malicious inputs manipulate the model into ignoring original instructions and executing designated action. In this paper, we investigate the underlying mechanisms of these attacks by analyzing the attention patterns within LLMs. We introduce the concept of the distraction effect, where specific attention heads, termed important heads, shift focus from the original instruction to the injected instruction. Building on this discovery, we propose Attention Tracker, a training-free detection method that tracks attention patterns on instruction to detect prompt injection attacks without the need for additional LLM inference. Our method generalizes effectively across diverse models, datasets, and attack types, showing an AUROC improvement of up to 10.0% over existing methods, and performs well even on small LLMs. We demonstrate the robustness of our approach through extensive evaluations and provide insights into safeguarding LLM-integrated systems from prompt injection vulnerabilities.

* Project page: https://huggingface.co/spaces/TrustSafeAI/Attention-Tracker

Via

Access Paper or Ask Questions

LabSafety Bench: Benchmarking LLMs on Safety Issues in Scientific Labs

Oct 18, 2024

Yujun Zhou, Jingdong Yang, Kehan Guo, Pin-Yu Chen, Tian Gao, Werner Geyer, Nuno Moniz, Nitesh V Chawla, Xiangliang Zhang

Figure 1 for LabSafety Bench: Benchmarking LLMs on Safety Issues in Scientific Labs

Figure 2 for LabSafety Bench: Benchmarking LLMs on Safety Issues in Scientific Labs

Figure 3 for LabSafety Bench: Benchmarking LLMs on Safety Issues in Scientific Labs

Figure 4 for LabSafety Bench: Benchmarking LLMs on Safety Issues in Scientific Labs

Abstract:Laboratory accidents pose significant risks to human life and property, underscoring the importance of robust safety protocols. Despite advancements in safety training, laboratory personnel may still unknowingly engage in unsafe practices. With the increasing reliance on large language models (LLMs) for guidance in various fields, including laboratory settings, there is a growing concern about their reliability in critical safety-related decision-making. Unlike trained human researchers, LLMs lack formal lab safety education, raising questions about their ability to provide safe and accurate guidance. Existing research on LLM trustworthiness primarily focuses on issues such as ethical compliance, truthfulness, and fairness but fails to fully cover safety-critical real-world applications, like lab safety. To address this gap, we propose the Laboratory Safety Benchmark (LabSafety Bench), a comprehensive evaluation framework based on a new taxonomy aligned with Occupational Safety and Health Administration (OSHA) protocols. This benchmark includes 765 multiple-choice questions verified by human experts, assessing LLMs and vision language models (VLMs) performance in lab safety contexts. Our evaluations demonstrate that while GPT-4o outperforms human participants, it is still prone to critical errors, highlighting the risks of relying on LLMs in safety-critical environments. Our findings emphasize the need for specialized benchmarks to accurately assess the trustworthiness of LLMs in real-world safety applications.

* 50 pages, 19 figures

Via

Access Paper or Ask Questions

Position Specific Scoring Is All You Need? Revisiting Protein Sequence Classification Tasks

Oct 16, 2024

Sarwan Ali, Taslim Murad, Prakash Chourasia, Haris Mansoor, Imdad Ullah Khan, Pin-Yu Chen, Murray Patterson

Figure 1 for Position Specific Scoring Is All You Need? Revisiting Protein Sequence Classification Tasks

Figure 2 for Position Specific Scoring Is All You Need? Revisiting Protein Sequence Classification Tasks

Figure 3 for Position Specific Scoring Is All You Need? Revisiting Protein Sequence Classification Tasks

Figure 4 for Position Specific Scoring Is All You Need? Revisiting Protein Sequence Classification Tasks

Abstract:Understanding the structural and functional characteristics of proteins are crucial for developing preventative and curative strategies that impact fields from drug discovery to policy development. An important and popular technique for examining how amino acids make up these characteristics of the protein sequences with position-specific scoring (PSS). While the string kernel is crucial in natural language processing (NLP), it is unclear if string kernels can extract biologically meaningful information from protein sequences, despite the fact that they have been shown to be effective in the general sequence analysis tasks. In this work, we propose a weighted PSS kernel matrix (or W-PSSKM), that combines a PSS representation of protein sequences, which encodes the frequency information of each amino acid in a sequence, with the notion of the string kernel. This results in a novel kernel function that outperforms many other approaches for protein sequence classification. We perform extensive experimentation to evaluate the proposed method. Our findings demonstrate that the W-PSSKM significantly outperforms existing baselines and state-of-the-art methods and achieves up to 45.1\% improvement in classification accuracy.

Via

Access Paper or Ask Questions

SEAL: Safety-enhanced Aligned LLM Fine-tuning via Bilevel Data Selection

Oct 09, 2024

Han Shen, Pin-Yu Chen, Payel Das, Tianyi Chen

Figure 1 for SEAL: Safety-enhanced Aligned LLM Fine-tuning via Bilevel Data Selection

Figure 2 for SEAL: Safety-enhanced Aligned LLM Fine-tuning via Bilevel Data Selection

Figure 3 for SEAL: Safety-enhanced Aligned LLM Fine-tuning via Bilevel Data Selection

Figure 4 for SEAL: Safety-enhanced Aligned LLM Fine-tuning via Bilevel Data Selection

Abstract:Fine-tuning on task-specific data to boost downstream performance is a crucial step for leveraging Large Language Models (LLMs). However, previous studies have demonstrated that fine-tuning the models on several adversarial samples or even benign data can greatly comprise the model's pre-equipped alignment and safety capabilities. In this work, we propose SEAL, a novel framework to enhance safety in LLM fine-tuning. SEAL learns a data ranker based on the bilevel optimization to up rank the safe and high-quality fine-tuning data and down rank the unsafe or low-quality ones. Models trained with SEAL demonstrate superior quality over multiple baselines, with 8.5% and 9.7% win rate increase compared to random selection respectively on Llama-3-8b-Instruct and Merlinite-7b models. Our code is available on github https://github.com/hanshen95/SEAL.

Via

Access Paper or Ask Questions

SONAR: A Synthetic AI-Audio Detection Framework~and Benchmark

Oct 06, 2024

Xiang Li, Pin-Yu Chen, Wenqi Wei

Figure 1 for SONAR: A Synthetic AI-Audio Detection Framework~and Benchmark

Figure 2 for SONAR: A Synthetic AI-Audio Detection Framework~and Benchmark

Figure 3 for SONAR: A Synthetic AI-Audio Detection Framework~and Benchmark

Figure 4 for SONAR: A Synthetic AI-Audio Detection Framework~and Benchmark

Abstract:Recent advances in Text-to-Speech (TTS) and Voice-Conversion (VC) using generative Artificial Intelligence (AI) technology have made it possible to generate high-quality and realistic human-like audio. This introduces significant challenges to distinguishing AI-synthesized speech from the authentic human voice and could raise potential issues of misuse for malicious purposes such as impersonation and fraud, spreading misinformation, deepfakes, and scams. However, existing detection techniques for AI-synthesized audio have not kept pace and often exhibit poor generalization across diverse datasets. In this paper, we introduce SONAR, a synthetic AI-Audio Detection Framework and Benchmark, aiming to provide a comprehensive evaluation for distinguishing cutting-edge AI-synthesized auditory content. SONAR includes a novel evaluation dataset sourced from 9 diverse audio synthesis platforms, including leading TTS providers and state-of-the-art TTS models. It is the first framework to uniformly benchmark AI-audio detection across both traditional and foundation model-based deepfake detection systems. Through extensive experiments, we reveal the generalization limitations of existing detection methods and demonstrate that foundation models exhibit stronger generalization capabilities, which can be attributed to their model size and the scale and quality of pretraining data. Additionally, we explore the effectiveness and efficiency of few-shot fine-tuning in improving generalization, highlighting its potential for tailored applications, such as personalized detection systems for specific entities or individuals. Code and dataset are available at https://github.com/Jessegator/SONAR.

Via

Access Paper or Ask Questions

Large Language Models can be Strong Self-Detoxifiers

Oct 04, 2024

Ching-Yun Ko, Pin-Yu Chen, Payel Das, Youssef Mroueh, Soham Dan, Georgios Kollias, Subhajit Chaudhury, Tejaswini Pedapati, Luca Daniel

Abstract:Reducing the likelihood of generating harmful and toxic output is an essential task when aligning large language models (LLMs). Existing methods mainly rely on training an external reward model (i.e., another language model) or fine-tuning the LLM using self-generated data to influence the outcome. In this paper, we show that LLMs have the capability of self-detoxification without the use of an additional reward model or re-training. We propose \textit{Self-disciplined Autoregressive Sampling (SASA)}, a lightweight controlled decoding algorithm for toxicity reduction of LLMs. SASA leverages the contextual representations from an LLM to learn linear subspaces characterizing toxic v.s. non-toxic output in analytical forms. When auto-completing a response token-by-token, SASA dynamically tracks the margin of the current output to steer the generation away from the toxic subspace, by adjusting the autoregressive sampling strategy. Evaluated on LLMs of different scale and nature, namely Llama-3.1-Instruct (8B), Llama-2 (7B), and GPT2-L models with the RealToxicityPrompts, BOLD, and AttaQ benchmarks, SASA markedly enhances the quality of the generated sentences relative to the original models and attains comparable performance to state-of-the-art detoxification techniques, significantly reducing the toxicity level by only using the LLM's internal representations.

* 20 pages

Via

Access Paper or Ask Questions

Justice or Prejudice? Quantifying Biases in LLM-as-a-Judge

Oct 03, 2024

Jiayi Ye, Yanbo Wang, Yue Huang, Dongping Chen, Qihui Zhang, Nuno Moniz, Tian Gao, Werner Geyer, Chao Huang, Pin-Yu Chen(+2 more)

Figure 1 for Justice or Prejudice? Quantifying Biases in LLM-as-a-Judge

Figure 2 for Justice or Prejudice? Quantifying Biases in LLM-as-a-Judge

Figure 3 for Justice or Prejudice? Quantifying Biases in LLM-as-a-Judge

Figure 4 for Justice or Prejudice? Quantifying Biases in LLM-as-a-Judge

Abstract:LLM-as-a-Judge has been widely utilized as an evaluation method in various benchmarks and served as supervised rewards in model training. However, despite their excellence in many domains, potential issues are under-explored, undermining their reliability and the scope of their utility. Therefore, we identify 12 key potential biases and propose a new automated bias quantification framework-CALM-which systematically quantifies and analyzes each type of bias in LLM-as-a-Judge by using automated and principle-guided modification. Our experiments cover multiple popular language models, and the results indicate that while advanced models have achieved commendable overall performance, significant biases persist in certain specific tasks. Empirical results suggest that there remains room for improvement in the reliability of LLM-as-a-Judge. Moreover, we also discuss the explicit and implicit influence of these biases and give some suggestions for the reliable application of LLM-as-a-Judge. Our work highlights the need for stakeholders to address these issues and remind users to exercise caution in LLM-as-a-Judge applications.

Via

Access Paper or Ask Questions

Training Nonlinear Transformers for Chain-of-Thought Inference: A Theoretical Generalization Analysis

Oct 03, 2024

Hongkang Li, Meng Wang, Songtao Lu, Xiaodong Cui, Pin-Yu Chen

Figure 1 for Training Nonlinear Transformers for Chain-of-Thought Inference: A Theoretical Generalization Analysis

Figure 2 for Training Nonlinear Transformers for Chain-of-Thought Inference: A Theoretical Generalization Analysis

Figure 3 for Training Nonlinear Transformers for Chain-of-Thought Inference: A Theoretical Generalization Analysis

Figure 4 for Training Nonlinear Transformers for Chain-of-Thought Inference: A Theoretical Generalization Analysis

Abstract:Chain-of-Thought (CoT) is an efficient prompting method that enables the reasoning ability of large language models by augmenting the query using multiple examples with multiple intermediate steps. Despite the empirical success, the theoretical understanding of how to train a Transformer to achieve the CoT ability remains less explored. This is primarily due to the technical challenges involved in analyzing the nonconvex optimization on nonlinear attention models. To the best of our knowledge, this work provides the first theoretical study of training Transformers with nonlinear attention to obtain the CoT generalization capability so that the resulting model can inference on unseen tasks when the input is augmented by examples of the new task. We first quantify the required training samples and iterations to train a Transformer model towards CoT ability. We then prove the success of its CoT generalization on unseen tasks with distribution-shifted testing data. Moreover, we theoretically characterize the conditions for an accurate reasoning output by CoT even when the provided reasoning examples contain noises and are not always accurate. In contrast, in-context learning (ICL), which can be viewed as one-step CoT without intermediate steps, may fail to provide an accurate output when CoT does. These theoretical findings are justified through experiments.

Via

Access Paper or Ask Questions

When Does Visual Prompting Outperform Linear Probing for Vision-Language Models? A Likelihood Perspective

Sep 04, 2024

Hsi-Ai Tsao, Lei Hsiung, Pin-Yu Chen, Tsung-Yi Ho

Figure 1 for When Does Visual Prompting Outperform Linear Probing for Vision-Language Models? A Likelihood Perspective

Figure 2 for When Does Visual Prompting Outperform Linear Probing for Vision-Language Models? A Likelihood Perspective

Figure 3 for When Does Visual Prompting Outperform Linear Probing for Vision-Language Models? A Likelihood Perspective

Figure 4 for When Does Visual Prompting Outperform Linear Probing for Vision-Language Models? A Likelihood Perspective

Abstract:Adapting pre-trained models to new tasks can exhibit varying effectiveness across datasets. Visual prompting, a state-of-the-art parameter-efficient transfer learning method, can significantly improve the performance of out-of-distribution tasks. On the other hand, linear probing, a standard transfer learning method, can sometimes become the best approach. We propose a log-likelihood ratio (LLR) approach to analyze the comparative benefits of visual prompting and linear probing. By employing the LLR score alongside resource-efficient visual prompts approximations, our cost-effective measure attains up to a 100-fold reduction in run time compared to full training, while achieving prediction accuracies up to 91%. The source code is available at https://github.com/IBM/VP-LLR.

Via

Access Paper or Ask Questions