Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Sarath Chandar

Combining Domain and Alignment Vectors to Achieve Better Knowledge-Safety Trade-offs in LLMs

Nov 11, 2024

Megh Thakkar, Yash More, Quentin Fournier, Matthew Riemer, Pin-Yu Chen, Amal Zouaq, Payel Das, Sarath Chandar

Figure 1 for Combining Domain and Alignment Vectors to Achieve Better Knowledge-Safety Trade-offs in LLMs

Figure 2 for Combining Domain and Alignment Vectors to Achieve Better Knowledge-Safety Trade-offs in LLMs

Figure 3 for Combining Domain and Alignment Vectors to Achieve Better Knowledge-Safety Trade-offs in LLMs

Figure 4 for Combining Domain and Alignment Vectors to Achieve Better Knowledge-Safety Trade-offs in LLMs

Abstract:There is a growing interest in training domain-expert LLMs that excel in specific technical fields compared to their general-purpose instruction-tuned counterparts. However, these expert models often experience a loss in their safety abilities in the process, making them capable of generating harmful content. As a solution, we introduce an efficient and effective merging-based alignment method called \textsc{MergeAlign} that interpolates the domain and alignment vectors, creating safer domain-specific models while preserving their utility. We apply \textsc{MergeAlign} on Llama3 variants that are experts in medicine and finance, obtaining substantial alignment improvements with minimal to no degradation on domain-specific benchmarks. We study the impact of model merging through model similarity metrics and contributions of individual models being merged. We hope our findings open new research avenues and inspire more efficient development of safe expert LLMs.

Via

Access Paper or Ask Questions

Do Robot Snakes Dream like Electric Sheep? Investigating the Effects of Architectural Inductive Biases on Hallucination

Oct 22, 2024

Jerry Huang, Prasanna Parthasarathi, Mehdi Rezagholizadeh, Boxing Chen, Sarath Chandar

Figure 1 for Do Robot Snakes Dream like Electric Sheep? Investigating the Effects of Architectural Inductive Biases on Hallucination

Figure 2 for Do Robot Snakes Dream like Electric Sheep? Investigating the Effects of Architectural Inductive Biases on Hallucination

Figure 3 for Do Robot Snakes Dream like Electric Sheep? Investigating the Effects of Architectural Inductive Biases on Hallucination

Figure 4 for Do Robot Snakes Dream like Electric Sheep? Investigating the Effects of Architectural Inductive Biases on Hallucination

Abstract:The growth in prominence of large language models (LLMs) in everyday life can be largely attributed to their generative abilities, yet some of this is also owed to the risks and costs associated with their use. On one front is their tendency to \textit{hallucinate} false or misleading information, limiting their reliability. On another is the increasing focus on the computational limitations associated with traditional self-attention based LLMs, which has brought about new alternatives, in particular recurrent models, meant to overcome them. Yet it remains uncommon to consider these two concerns simultaneously. Do changes in architecture exacerbate/alleviate existing concerns about hallucinations? Do they affect how and where they occur? Through an extensive evaluation, we study how these architecture-based inductive biases affect the propensity to hallucinate. While hallucination remains a general phenomenon not limited to specific architectures, the situations in which they occur and the ease with which specific types of hallucinations can be induced can significantly differ based on the model architecture. These findings highlight the need for better understanding both these problems in conjunction with each other, as well as consider how to design more universal techniques for handling hallucinations.

Via

Access Paper or Ask Questions

Toward Debugging Deep Reinforcement Learning Programs with RLExplorer

Oct 06, 2024

Rached Bouchoucha, Ahmed Haj Yahmed, Darshan Patil, Janarthanan Rajendran, Amin Nikanjam, Sarath Chandar, Foutse Khomh

Figure 1 for Toward Debugging Deep Reinforcement Learning Programs with RLExplorer

Figure 2 for Toward Debugging Deep Reinforcement Learning Programs with RLExplorer

Figure 3 for Toward Debugging Deep Reinforcement Learning Programs with RLExplorer

Figure 4 for Toward Debugging Deep Reinforcement Learning Programs with RLExplorer

Abstract:Deep reinforcement learning (DRL) has shown success in diverse domains such as robotics, computer games, and recommendation systems. However, like any other software system, DRL-based software systems are susceptible to faults that pose unique challenges for debugging and diagnosing. These faults often result in unexpected behavior without explicit failures and error messages, making debugging difficult and time-consuming. Therefore, automating the monitoring and diagnosis of DRL systems is crucial to alleviate the burden on developers. In this paper, we propose RLExplorer, the first fault diagnosis approach for DRL-based software systems. RLExplorer automatically monitors training traces and runs diagnosis routines based on properties of the DRL learning dynamics to detect the occurrence of DRL-specific faults. It then logs the results of these diagnoses as warnings that cover theoretical concepts, recommended practices, and potential solutions to the identified faults. We conducted two sets of evaluations to assess RLExplorer. Our first evaluation of faulty DRL samples from Stack Overflow revealed that our approach can effectively diagnose real faults in 83% of the cases. Our second evaluation of RLExplorer with 15 DRL experts/developers showed that (1) RLExplorer could identify 3.6 times more defects than manual debugging and (2) RLExplorer is easily integrated into DRL applications.

* Accepted for publication in The International Conference on Software Maintenance and Evolution (ICSME 2024)

Via

Access Paper or Ask Questions

Context-Aware Assistant Selection for Improved Inference Acceleration with Large Language Models

Aug 16, 2024

Jerry Huang, Prasanna Parthasarathi, Mehdi Rezagholizadeh, Sarath Chandar

Abstract:Despite their widespread adoption, large language models (LLMs) remain prohibitive to use under resource constraints, with their ever growing sizes only increasing the barrier for use. One noted issue is the high latency associated with auto-regressive generation, rendering large LLMs use dependent on advanced computing infrastructure. Assisted decoding, where a smaller draft model guides a larger target model's generation, has helped alleviate this, but remains dependent on alignment between the two models. Thus if the draft model is insufficiently capable on some domain relative to the target model, performance can degrade. Alternatively, one can leverage multiple draft models to better cover the expertise of the target, but when multiple black-box draft models are available, selecting an assistant without details about its construction can be difficult. To better understand this decision making problem, we observe it as a contextual bandit, where a policy must choose a draft model based on a context. We show that even without prior knowledge of the draft models, creating an offline dataset from only outputs of independent draft/target models and training a policy over the alignment of these outputs can accelerate performance on multiple domains provided the candidates are effective. Further results show this to hold on various settings with multiple assisted decoding candidates, highlighting its flexibility and the advantageous role that such decision making can play.

* 14 pages (9 pages main content + references + appendix)

Via

Access Paper or Ask Questions

Exploring Quantization for Efficient Pre-Training of Transformer Language Models

Jul 16, 2024

Kamran Chitsaz, Quentin Fournier, Gonçalo Mordido, Sarath Chandar

Figure 1 for Exploring Quantization for Efficient Pre-Training of Transformer Language Models

Figure 2 for Exploring Quantization for Efficient Pre-Training of Transformer Language Models

Figure 3 for Exploring Quantization for Efficient Pre-Training of Transformer Language Models

Figure 4 for Exploring Quantization for Efficient Pre-Training of Transformer Language Models

Abstract:The increasing scale of Transformer models has led to an increase in their pre-training computational requirements. While quantization has proven to be effective after pre-training and during fine-tuning, applying quantization in Transformers during pre-training has remained largely unexplored at scale for language modeling. This study aims to explore the impact of quantization for efficient pre-training of Transformers, with a focus on linear layer components. By systematically applying straightforward linear quantization to weights, activations, gradients, and optimizer states, we assess its effects on model efficiency, stability, and performance during training. By offering a comprehensive recipe of effective quantization strategies to be applied during the pre-training of Transformers, we promote high training efficiency from scratch while retaining language modeling ability. Code is available at https://github.com/chandar-lab/EfficientLLMs.

Via

Access Paper or Ask Questions

Why Don't Prompt-Based Fairness Metrics Correlate?

Jun 09, 2024

Abdelrahman Zayed, Goncalo Mordido, Ioana Baldini, Sarath Chandar

Figure 1 for Why Don't Prompt-Based Fairness Metrics Correlate?

Figure 2 for Why Don't Prompt-Based Fairness Metrics Correlate?

Figure 3 for Why Don't Prompt-Based Fairness Metrics Correlate?

Figure 4 for Why Don't Prompt-Based Fairness Metrics Correlate?

Abstract:The widespread use of large language models has brought up essential questions about the potential biases these models might learn. This led to the development of several metrics aimed at evaluating and mitigating these biases. In this paper, we first demonstrate that prompt-based fairness metrics exhibit poor agreement, as measured by correlation, raising important questions about the reliability of fairness assessment using prompts. Then, we outline six relevant reasons why such a low correlation is observed across existing metrics. Based on these insights, we propose a method called Correlated Fairness Output (CAIRO) to enhance the correlation between fairness metrics. CAIRO augments the original prompts of a given fairness metric by using several pre-trained language models and then selects the combination of the augmented prompts that achieves the highest correlation across metrics. We show a significant improvement in Pearson correlation from 0.3 and 0.18 to 0.90 and 0.98 across metrics for gender and religion biases, respectively. Our code is available at https://github.com/chandar-lab/CAIRO.

* In Proceedings of ACL main 2024

Via

Access Paper or Ask Questions

A Deep Dive into the Trade-Offs of Parameter-Efficient Preference Alignment Techniques

Jun 07, 2024

Megh Thakkar, Quentin Fournier, Matthew D Riemer, Pin-Yu Chen, Amal Zouaq, Payel Das, Sarath Chandar

Figure 1 for A Deep Dive into the Trade-Offs of Parameter-Efficient Preference Alignment Techniques

Figure 2 for A Deep Dive into the Trade-Offs of Parameter-Efficient Preference Alignment Techniques

Figure 3 for A Deep Dive into the Trade-Offs of Parameter-Efficient Preference Alignment Techniques

Figure 4 for A Deep Dive into the Trade-Offs of Parameter-Efficient Preference Alignment Techniques

Abstract:Large language models are first pre-trained on trillions of tokens and then instruction-tuned or aligned to specific preferences. While pre-training remains out of reach for most researchers due to the compute required, fine-tuning has become affordable thanks to parameter-efficient methods such as LoRA and QLoRA. Alignment is known to be sensitive to the many factors involved, including the quantity and quality of data, the alignment method, and the adapter rank. However, there has not yet been an extensive study of their effect on downstream performance. To address this gap, we conduct an in-depth investigation of the impact of popular choices for three crucial axes: (i) the alignment dataset (HH-RLHF and BeaverTails), (ii) the alignment technique (SFT and DPO), and (iii) the model (LLaMA-1, Vicuna-v1.3, Mistral-7b, and Mistral-7b-Instruct). Our extensive setup spanning over 300 experiments reveals consistent trends and unexpected findings. We observe how more informative data helps with preference alignment, cases where supervised fine-tuning outperforms preference optimization, and how aligning to a distinct preference boosts performance on downstream tasks. Through our in-depth analyses, we put forward key guidelines to help researchers perform more effective parameter-efficient LLM alignment.

* Accepted to ACL (Main) 2024

Via

Access Paper or Ask Questions

BindGPT: A Scalable Framework for 3D Molecular Design via Language Modeling and Reinforcement Learning

Jun 06, 2024

Artem Zholus, Maksim Kuznetsov, Roman Schutski, Rim Shayakhmetov, Daniil Polykovskiy, Sarath Chandar, Alex Zhavoronkov

Figure 1 for BindGPT: A Scalable Framework for 3D Molecular Design via Language Modeling and Reinforcement Learning

Figure 2 for BindGPT: A Scalable Framework for 3D Molecular Design via Language Modeling and Reinforcement Learning

Figure 3 for BindGPT: A Scalable Framework for 3D Molecular Design via Language Modeling and Reinforcement Learning

Figure 4 for BindGPT: A Scalable Framework for 3D Molecular Design via Language Modeling and Reinforcement Learning

Abstract:Generating novel active molecules for a given protein is an extremely challenging task for generative models that requires an understanding of the complex physical interactions between the molecule and its environment. In this paper, we present a novel generative model, BindGPT which uses a conceptually simple but powerful approach to create 3D molecules within the protein's binding site. Our model produces molecular graphs and conformations jointly, eliminating the need for an extra graph reconstruction step. We pretrain BindGPT on a large-scale dataset and fine-tune it with reinforcement learning using scores from external simulation software. We demonstrate how a single pretrained language model can serve at the same time as a 3D molecular generative model, conformer generator conditioned on the molecular graph, and a pocket-conditioned 3D molecule generator. Notably, the model does not make any representational equivariance assumptions about the domain of generation. We show how such simple conceptual approach combined with pretraining and scaling can perform on par or better than the current best specialized diffusion models, language models, and graph neural networks while being two orders of magnitude cheaper to sample.

Via

Access Paper or Ask Questions

Predicting the Impact of Model Expansion through the Minima Manifold: A Loss Landscape Perspective

May 24, 2024

Pranshu Malviya, Jerry Huang, Quentin Fournier, Sarath Chandar

Figure 1 for Predicting the Impact of Model Expansion through the Minima Manifold: A Loss Landscape Perspective

Figure 2 for Predicting the Impact of Model Expansion through the Minima Manifold: A Loss Landscape Perspective

Figure 3 for Predicting the Impact of Model Expansion through the Minima Manifold: A Loss Landscape Perspective

Figure 4 for Predicting the Impact of Model Expansion through the Minima Manifold: A Loss Landscape Perspective

Abstract:The optimal model for a given task is often challenging to determine, requiring training multiple models from scratch which becomes prohibitive as dataset and model sizes grow. A more efficient alternative is to reuse smaller pre-trained models by expanding them, however, this is not widely adopted as how this impacts training dynamics remains poorly understood. While prior works have introduced statistics to measure these effects, they remain flawed. To rectify this, we offer a new approach for understanding and quantifying the impact of expansion through the lens of the loss landscape, which has been shown to contain a manifold of linearly connected minima. Building on this new perspective, we propose a metric to study the impact of expansion by estimating the size of the manifold. Experimental results show a clear relationship between gains in performance and manifold size, enabling the comparison of candidate models and presenting a first step towards expanding models more reliably based on geometric properties of the loss landscape.

Via

Access Paper or Ask Questions

Interpretability Needs a New Paradigm

May 08, 2024

Andreas Madsen, Himabindu Lakkaraju, Siva Reddy, Sarath Chandar

Abstract:Interpretability is the study of explaining models in understandable terms to humans. At present, interpretability is divided into two paradigms: the intrinsic paradigm, which believes that only models designed to be explained can be explained, and the post-hoc paradigm, which believes that black-box models can be explained. At the core of this debate is how each paradigm ensures its explanations are faithful, i.e., true to the model's behavior. This is important, as false but convincing explanations lead to unsupported confidence in artificial intelligence (AI), which can be dangerous. This paper's position is that we should think about new paradigms while staying vigilant regarding faithfulness. First, by examining the history of paradigms in science, we see that paradigms are constantly evolving. Then, by examining the current paradigms, we can understand their underlying beliefs, the value they bring, and their limitations. Finally, this paper presents 3 emerging paradigms for interpretability. The first paradigm designs models such that faithfulness can be easily measured. Another optimizes models such that explanations become faithful. The last paradigm proposes to develop models that produce both a prediction and an explanation.

Via

Access Paper or Ask Questions