Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Dawn Song

University of California, Berkeley

MLAN: Language-Based Instruction Tuning Improves Zero-Shot Generalization of Multimodal Large Language Models

Nov 19, 2024

Jianhong Tu, Zhuohao Ni, Nicholas Crispino, Zihao Yu, Michael Bendersky, Beliz Gunel, Ruoxi Jia, Xin Liu, Lingjuan Lyu, Dawn Song(+1 more)

Figure 1 for MLAN: Language-Based Instruction Tuning Improves Zero-Shot Generalization of Multimodal Large Language Models

Figure 2 for MLAN: Language-Based Instruction Tuning Improves Zero-Shot Generalization of Multimodal Large Language Models

Figure 3 for MLAN: Language-Based Instruction Tuning Improves Zero-Shot Generalization of Multimodal Large Language Models

Figure 4 for MLAN: Language-Based Instruction Tuning Improves Zero-Shot Generalization of Multimodal Large Language Models

Abstract:We present a novel instruction tuning recipe to improve the zero-shot task generalization of multimodal large language models. In contrast to existing instruction tuning mechanisms that heavily rely on visual instructions, our approach focuses on language-based instruction tuning, offering a distinct and more training efficient path for multimodal instruction tuning. We evaluate the performance of the proposed approach on 9 unseen datasets across both language and vision modalities. Our results show that our language-only instruction tuning is able to significantly improve the performance of two pretrained multimodal models based on Llama 2 and Vicuna on those unseen datasets. Interestingly, the language instruction following ability also helps unlock the models to follow vision instructions without explicit training. Compared to the state of the art multimodal instruction tuning approaches that are mainly based on visual instructions, our language-based method not only achieves superior performance but also significantly enhances training efficiency. For instance, the language-only instruction tuning produces competitive average performance across the evaluated datasets (with even better performance on language datasets) with significant training efficiency improvements (on average 4x), thanks to the striking reduction in the need for vision data. With a small number of visual instructions, this emerging language instruction following ability transfers well to the unseen vision datasets, outperforming the state of the art with greater training efficiency.

Via

Access Paper or Ask Questions

RedCode: Risky Code Execution and Generation Benchmark for Code Agents

Nov 12, 2024

Chengquan Guo, Xun Liu, Chulin Xie, Andy Zhou, Yi Zeng, Zinan Lin, Dawn Song, Bo Li

Figure 1 for RedCode: Risky Code Execution and Generation Benchmark for Code Agents

Figure 2 for RedCode: Risky Code Execution and Generation Benchmark for Code Agents

Figure 3 for RedCode: Risky Code Execution and Generation Benchmark for Code Agents

Figure 4 for RedCode: Risky Code Execution and Generation Benchmark for Code Agents

Abstract:With the rapidly increasing capabilities and adoption of code agents for AI-assisted coding, safety concerns, such as generating or executing risky code, have become significant barriers to the real-world deployment of these agents. To provide comprehensive and practical evaluations on the safety of code agents, we propose RedCode, a benchmark for risky code execution and generation: (1) RedCode-Exec provides challenging prompts that could lead to risky code execution, aiming to evaluate code agents' ability to recognize and handle unsafe code. We provide a total of 4,050 risky test cases in Python and Bash tasks with diverse input formats including code snippets and natural text. They covers 25 types of critical vulnerabilities spanning 8 domains (e.g., websites, file systems). We provide Docker environments and design corresponding evaluation metrics to assess their execution results. (2) RedCode-Gen provides 160 prompts with function signatures and docstrings as input to assess whether code agents will follow instructions to generate harmful code or software. Our empirical findings, derived from evaluating three agent frameworks based on 19 LLMs, provide insights into code agents' vulnerabilities. For instance, evaluations on RedCode-Exec show that agents are more likely to reject executing risky operations on the operating system, but are less likely to reject executing technically buggy code, indicating high risks. Risky operations described in natural text lead to a lower rejection rate than those in code format. Additionally, evaluations on RedCode-Gen show that more capable base models and agents with stronger overall coding abilities, such as GPT4, tend to produce more sophisticated and effective harmful software. Our findings highlight the need for stringent safety evaluations for diverse code agents. Our dataset and code are available at https://github.com/AI-secure/RedCode.

* Accepted by NeurIPS 2024 Datasets and Benchmarks Track

Via

Access Paper or Ask Questions

Hidden Persuaders: LLMs' Political Leaning and Their Influence on Voters

Oct 31, 2024

Yujin Potter, Shiyang Lai, Junsol Kim, James Evans, Dawn Song

Figure 1 for Hidden Persuaders: LLMs' Political Leaning and Their Influence on Voters

Figure 2 for Hidden Persuaders: LLMs' Political Leaning and Their Influence on Voters

Figure 3 for Hidden Persuaders: LLMs' Political Leaning and Their Influence on Voters

Figure 4 for Hidden Persuaders: LLMs' Political Leaning and Their Influence on Voters

Abstract:How could LLMs influence our democracy? We investigate LLMs' political leanings and the potential influence of LLMs on voters by conducting multiple experiments in a U.S. presidential election context. Through a voting simulation, we first demonstrate 18 open- and closed-weight LLMs' political preference for a Democratic nominee over a Republican nominee. We show how this leaning towards the Democratic nominee becomes more pronounced in instruction-tuned models compared to their base versions by analyzing their responses to candidate-policy related questions. We further explore the potential impact of LLMs on voter choice by conducting an experiment with 935 U.S. registered voters. During the experiments, participants interacted with LLMs (Claude-3, Llama-3, and GPT-4) over five exchanges. The experiment results show a shift in voter choices towards the Democratic nominee following LLM interaction, widening the voting margin from 0.7% to 4.6%, even though LLMs were not asked to persuade users to support the Democratic nominee during the discourse. This effect is larger than many previous studies on the persuasiveness of political campaigns, which have shown minimal effects in presidential elections. Many users also expressed a desire for further political interaction with LLMs. Which aspects of LLM interactions drove these shifts in voter choice requires further study. Lastly, we explore how a safety method can make LLMs more politically neutral, while leaving some open questions.

* EMNLP 2024 Main

Via

Access Paper or Ask Questions

CTINEXUS: Leveraging Optimized LLM In-Context Learning for Constructing Cybersecurity Knowledge Graphs Under Data Scarcity

Oct 28, 2024

Yutong Cheng, Osama Bajaber, Saimon Amanuel Tsegai, Dawn Song, Peng Gao

Abstract:Textual descriptions in cyber threat intelligence (CTI) reports, such as security articles and news, are rich sources of knowledge about cyber threats, crucial for organizations to stay informed about the rapidly evolving threat landscape. However, current CTI extraction methods lack flexibility and generalizability, often resulting in inaccurate and incomplete knowledge extraction. Syntax parsing relies on fixed rules and dictionaries, while model fine-tuning requires large annotated datasets, making both paradigms challenging to adapt to new threats and ontologies. To bridge the gap, we propose CTINexus, a novel framework leveraging optimized in-context learning (ICL) of large language models (LLMs) for data-efficient CTI knowledge extraction and high-quality cybersecurity knowledge graph (CSKG) construction. Unlike existing methods, CTINexus requires neither extensive data nor parameter tuning and can adapt to various ontologies with minimal annotated examples. This is achieved through (1) a carefully designed automatic prompt construction strategy with optimal demonstration retrieval for extracting a wide range of cybersecurity entities and relations; (2) a hierarchical entity alignment technique that canonicalizes the extracted knowledge and removes redundancy; (3) an ICL-enhanced long-distance relation prediction technique to further complete the CKSG with missing links. Our extensive evaluations using 150 real-world CTI reports collected from 10 platforms demonstrate that CTINexus significantly outperforms existing methods in constructing accurate and complete CSKGs, highlighting its potential to transform CTI analysis with an efficient and adaptable solution for the dynamic threat landscape.

* under peer-review

Via

Access Paper or Ask Questions

SecCodePLT: A Unified Platform for Evaluating the Security of Code GenAI

Oct 14, 2024

Yu Yang, Yuzhou Nie, Zhun Wang, Yuheng Tang, Wenbo Guo, Bo Li, Dawn Song

Figure 1 for SecCodePLT: A Unified Platform for Evaluating the Security of Code GenAI

Figure 2 for SecCodePLT: A Unified Platform for Evaluating the Security of Code GenAI

Figure 3 for SecCodePLT: A Unified Platform for Evaluating the Security of Code GenAI

Figure 4 for SecCodePLT: A Unified Platform for Evaluating the Security of Code GenAI

Abstract:Existing works have established multiple benchmarks to highlight the security risks associated with Code GenAI. These risks are primarily reflected in two areas: a model potential to generate insecure code (insecure coding) and its utility in cyberattacks (cyberattack helpfulness). While these benchmarks have made significant strides, there remain opportunities for further improvement. For instance, many current benchmarks tend to focus more on a model ability to provide attack suggestions rather than its capacity to generate executable attacks. Additionally, most benchmarks rely heavily on static evaluation metrics, which may not be as precise as dynamic metrics such as passing test cases. Conversely, expert-verified benchmarks, while offering high-quality data, often operate at a smaller scale. To address these gaps, we develop SecCodePLT, a unified and comprehensive evaluation platform for code GenAIs' risks. For insecure code, we introduce a new methodology for data creation that combines experts with automatic generation. Our methodology ensures the data quality while enabling large-scale generation. We also associate samples with test cases to conduct code-related dynamic evaluation. For cyberattack helpfulness, we set up a real environment and construct samples to prompt a model to generate actual attacks, along with dynamic metrics in our environment. We conduct extensive experiments and show that SecCodePLT outperforms the state-of-the-art (SOTA) benchmark CyberSecEval in security relevance. Furthermore, it better identifies the security risks of SOTA models in insecure coding and cyberattack helpfulness. Finally, we apply SecCodePLT to the SOTA code agent, Cursor, and, for the first time, identify non-trivial security risks in this advanced coding agent.

Via

Access Paper or Ask Questions

An undetectable watermark for generative image models

Oct 09, 2024

Sam Gunn, Xuandong Zhao, Dawn Song

Figure 1 for An undetectable watermark for generative image models

Figure 2 for An undetectable watermark for generative image models

Figure 3 for An undetectable watermark for generative image models

Figure 4 for An undetectable watermark for generative image models

Abstract:We present the first undetectable watermarking scheme for generative image models. Undetectability ensures that no efficient adversary can distinguish between watermarked and un-watermarked images, even after making many adaptive queries. In particular, an undetectable watermark does not degrade image quality under any efficiently computable metric. Our scheme works by selecting the initial latents of a diffusion model using a pseudorandom error-correcting code (Christ and Gunn, 2024), a strategy which guarantees undetectability and robustness. We experimentally demonstrate that our watermarks are quality-preserving and robust using Stable Diffusion 2.1. Our experiments verify that, in contrast to every prior scheme we tested, our watermark does not degrade image quality. Our experiments also demonstrate robustness: existing watermark removal attacks fail to remove our watermark from images without significantly degrading the quality of the images. Finally, we find that we can robustly encode 512 bits in our watermark, and up to 2500 bits when the images are not subjected to watermark removal attacks. Our code is available at https://github.com/XuandongZhao/PRC-Watermark.

Via

Access Paper or Ask Questions

Multimodal Situational Safety

Oct 08, 2024

Kaiwen Zhou, Chengzhi Liu, Xuandong Zhao, Anderson Compalas, Dawn Song, Xin Eric Wang

Figure 1 for Multimodal Situational Safety

Figure 2 for Multimodal Situational Safety

Figure 3 for Multimodal Situational Safety

Figure 4 for Multimodal Situational Safety

Abstract:Multimodal Large Language Models (MLLMs) are rapidly evolving, demonstrating impressive capabilities as multimodal assistants that interact with both humans and their environments. However, this increased sophistication introduces significant safety concerns. In this paper, we present the first evaluation and analysis of a novel safety challenge termed Multimodal Situational Safety, which explores how safety considerations vary based on the specific situation in which the user or agent is engaged. We argue that for an MLLM to respond safely, whether through language or action, it often needs to assess the safety implications of a language query within its corresponding visual context. To evaluate this capability, we develop the Multimodal Situational Safety benchmark (MSSBench) to assess the situational safety performance of current MLLMs. The dataset comprises 1,820 language query-image pairs, half of which the image context is safe, and the other half is unsafe. We also develop an evaluation framework that analyzes key safety aspects, including explicit safety reasoning, visual understanding, and, crucially, situational safety reasoning. Our findings reveal that current MLLMs struggle with this nuanced safety problem in the instruction-following setting and struggle to tackle these situational safety challenges all at once, highlighting a key area for future research. Furthermore, we develop multi-agent pipelines to coordinately solve safety challenges, which shows consistent improvement in safety over the original MLLM response. Code and data: mssbench.github.io.

Via

Access Paper or Ask Questions

LLM-PBE: Assessing Data Privacy in Large Language Models

Aug 23, 2024

Qinbin Li, Junyuan Hong, Chulin Xie, Jeffrey Tan, Rachel Xin, Junyi Hou, Xavier Yin, Zhun Wang, Dan Hendrycks, Zhangyang Wang(+3 more)

Figure 1 for LLM-PBE: Assessing Data Privacy in Large Language Models

Figure 2 for LLM-PBE: Assessing Data Privacy in Large Language Models

Figure 3 for LLM-PBE: Assessing Data Privacy in Large Language Models

Figure 4 for LLM-PBE: Assessing Data Privacy in Large Language Models

Abstract:Large Language Models (LLMs) have become integral to numerous domains, significantly advancing applications in data management, mining, and analysis. Their profound capabilities in processing and interpreting complex language data, however, bring to light pressing concerns regarding data privacy, especially the risk of unintentional training data leakage. Despite the critical nature of this issue, there has been no existing literature to offer a comprehensive assessment of data privacy risks in LLMs. Addressing this gap, our paper introduces LLM-PBE, a toolkit crafted specifically for the systematic evaluation of data privacy risks in LLMs. LLM-PBE is designed to analyze privacy across the entire lifecycle of LLMs, incorporating diverse attack and defense strategies, and handling various data types and metrics. Through detailed experimentation with multiple LLMs, LLM-PBE facilitates an in-depth exploration of data privacy concerns, shedding light on influential factors such as model size, data characteristics, and evolving temporal dimensions. This study not only enriches the understanding of privacy issues in LLMs but also serves as a vital resource for future research in the field. Aimed at enhancing the breadth of knowledge in this area, the findings, resources, and our full technical report are made available at https://llm-pbe.github.io/, providing an open platform for academic and practical advancements in LLM privacy assessment.

Via

Access Paper or Ask Questions

Tamper-Resistant Safeguards for Open-Weight LLMs

Aug 01, 2024

Rishub Tamirisa, Bhrugu Bharathi, Long Phan, Andy Zhou, Alice Gatti, Tarun Suresh, Maxwell Lin, Justin Wang, Rowan Wang, Ron Arel(+5 more)

Figure 1 for Tamper-Resistant Safeguards for Open-Weight LLMs

Figure 2 for Tamper-Resistant Safeguards for Open-Weight LLMs

Figure 3 for Tamper-Resistant Safeguards for Open-Weight LLMs

Figure 4 for Tamper-Resistant Safeguards for Open-Weight LLMs

Abstract:Rapid advances in the capabilities of large language models (LLMs) have raised widespread concerns regarding their potential for malicious use. Open-weight LLMs present unique challenges, as existing safeguards lack robustness to tampering attacks that modify model weights. For example, recent works have demonstrated that refusal and unlearning safeguards can be trivially removed with a few steps of fine-tuning. These vulnerabilities necessitate new approaches for enabling the safe release of open-weight LLMs. We develop a method, called TAR, for building tamper-resistant safeguards into open-weight LLMs such that adversaries cannot remove the safeguards even after thousands of steps of fine-tuning. In extensive evaluations and red teaming analyses, we find that our method greatly improves tamper-resistance while preserving benign capabilities. Our results demonstrate that tamper-resistance is a tractable problem, opening up a promising new avenue to improve the safety and security of open-weight LLMs.

* Website: https://www.tamper-resistant-safeguards.com

Via

Access Paper or Ask Questions

Can Editing LLMs Inject Harm?

Jul 29, 2024

Canyu Chen, Baixiang Huang, Zekun Li, Zhaorun Chen, Shiyang Lai, Xiongxiao Xu, Jia-Chen Gu, Jindong Gu, Huaxiu Yao, Chaowei Xiao(+5 more)

Figure 1 for Can Editing LLMs Inject Harm?

Figure 2 for Can Editing LLMs Inject Harm?

Figure 3 for Can Editing LLMs Inject Harm?

Figure 4 for Can Editing LLMs Inject Harm?

Abstract:Knowledge editing techniques have been increasingly adopted to efficiently correct the false or outdated knowledge in Large Language Models (LLMs), due to the high cost of retraining from scratch. Meanwhile, one critical but under-explored question is: can knowledge editing be used to inject harm into LLMs? In this paper, we propose to reformulate knowledge editing as a new type of safety threat for LLMs, namely Editing Attack, and conduct a systematic investigation with a newly constructed dataset EditAttack. Specifically, we focus on two typical safety risks of Editing Attack including Misinformation Injection and Bias Injection. For the risk of misinformation injection, we first categorize it into commonsense misinformation injection and long-tail misinformation injection. Then, we find that editing attacks can inject both types of misinformation into LLMs, and the effectiveness is particularly high for commonsense misinformation injection. For the risk of bias injection, we discover that not only can biased sentences be injected into LLMs with high effectiveness, but also one single biased sentence injection can cause a high bias increase in general outputs of LLMs, which are even highly irrelevant to the injected sentence, indicating a catastrophic impact on the overall fairness of LLMs. Then, we further illustrate the high stealthiness of editing attacks, measured by their impact on the general knowledge and reasoning capacities of LLMs, and show the hardness of defending editing attacks with empirical evidence. Our discoveries demonstrate the emerging misuse risks of knowledge editing techniques on compromising the safety alignment of LLMs.

* The first two authors contributed equally. 9 pages for main paper, 36 pages including appendix. The code, results, dataset for this paper and more resources are on the project website: https://llm-editing.github.io

Via

Access Paper or Ask Questions