Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Robert West

Controlling Latent Diffusion Using Latent CLIP

Mar 11, 2025

Jason Becker, Chris Wendler, Peter Baylies, Robert West, Christian Wressnegger

Figure 1 for Controlling Latent Diffusion Using Latent CLIP

Figure 2 for Controlling Latent Diffusion Using Latent CLIP

Figure 3 for Controlling Latent Diffusion Using Latent CLIP

Figure 4 for Controlling Latent Diffusion Using Latent CLIP

Abstract:Instead of performing text-conditioned denoising in the image domain, latent diffusion models (LDMs) operate in latent space of a variational autoencoder (VAE), enabling more efficient processing at reduced computational costs. However, while the diffusion process has moved to the latent space, the contrastive language-image pre-training (CLIP) models, as used in many image processing tasks, still operate in pixel space. Doing so requires costly VAE-decoding of latent images before they can be processed. In this paper, we introduce Latent-CLIP, a CLIP model that operates directly in the latent space. We train Latent-CLIP on 2.7B pairs of latent images and descriptive texts, and show that it matches zero-shot classification performance of similarly sized CLIP models on both the ImageNet benchmark and a LDM-generated version of it, demonstrating its effectiveness in assessing both real and generated content. Furthermore, we construct Latent-CLIP rewards for reward-based noise optimization (ReNO) and show that they match the performance of their CLIP counterparts on GenEval and T2I-CompBench while cutting the cost of the total pipeline by 21%. Finally, we use Latent-CLIP to guide generation away from harmful content, achieving strong performance on the inappropriate image prompts (I2P) benchmark and a custom evaluation, without ever requiring the costly step of decoding intermediate images.

Via

Access Paper or Ask Questions

Generating Structured Outputs from Language Models: Benchmark and Studies

Jan 18, 2025

Saibo Geng, Hudson Cooper, Michał Moskal, Samuel Jenkins, Julian Berman, Nathan Ranchin, Robert West, Eric Horvitz, Harsha Nori

Figure 1 for Generating Structured Outputs from Language Models: Benchmark and Studies

Figure 2 for Generating Structured Outputs from Language Models: Benchmark and Studies

Figure 3 for Generating Structured Outputs from Language Models: Benchmark and Studies

Figure 4 for Generating Structured Outputs from Language Models: Benchmark and Studies

Abstract:Reliably generating structured outputs has become a critical capability for modern language model (LM) applications. Constrained decoding has emerged as the dominant technology across sectors for enforcing structured outputs during generation. Despite its growing adoption, little has been done with the systematic evaluation of the behaviors and performance of constrained decoding. Constrained decoding frameworks have standardized around JSON Schema as a structured data format, with most uses guaranteeing constraint compliance given a schema. However, there is poor understanding of the effectiveness of the methods in practice. We present an evaluation framework to assess constrained decoding approaches across three critical dimensions: efficiency in generating constraint-compliant outputs, coverage of diverse constraint types, and quality of the generated outputs. To facilitate this evaluation, we introduce JSONSchemaBench, a benchmark for constrained decoding comprising 10K real-world JSON schemas that encompass a wide range of constraints with varying complexity. We pair the benchmark with the existing official JSON Schema Test Suite and evaluate six state-of-the-art constrained decoding frameworks, including Guidance, Outlines, Llamacpp, XGrammar, OpenAI, and Gemini. Through extensive experiments, we gain insights into the capabilities and limitations of constrained decoding on structured generation with real-world JSON schemas. Our work provides actionable insights for improving constrained decoding frameworks and structured generation tasks, setting a new standard for evaluating constrained decoding and structured generation. We release JSONSchemaBench at https://github.com/guidance-ai/jsonschemabench

Via

Access Paper or Ask Questions

Assessing Social Alignment: Do Personality-Prompted Large Language Models Behave Like Humans?

Dec 21, 2024

Ivan Zakazov, Mikolaj Boronski, Lorenzo Drudi, Robert West

Figure 1 for Assessing Social Alignment: Do Personality-Prompted Large Language Models Behave Like Humans?

Figure 2 for Assessing Social Alignment: Do Personality-Prompted Large Language Models Behave Like Humans?

Figure 3 for Assessing Social Alignment: Do Personality-Prompted Large Language Models Behave Like Humans?

Figure 4 for Assessing Social Alignment: Do Personality-Prompted Large Language Models Behave Like Humans?

Abstract:The ongoing revolution in language modelling has led to various novel applications, some of which rely on the emerging "social abilities" of large language models (LLMs). Already, many turn to the new "cyber friends" for advice during pivotal moments of their lives and trust them with their deepest secrets, implying that accurate shaping of LLMs' "personalities" is paramount. Leveraging the vast diversity of data on which LLMs are pretrained, state-of-the-art approaches prompt them to adopt a particular personality. We ask (i) if personality-prompted models behave (i.e. "make" decisions when presented with a social situation) in line with the ascribed personality, and (ii) if their behavior can be finely controlled. We use classic psychological experiments - the Milgram Experiment and the Ultimatum Game - as social interaction testbeds and apply personality prompting to GPT-3.5/4/4o-mini/4o. Our experiments reveal failure modes of the prompt-based modulation of the models' "behavior", thus challenging the feasibility of personality prompting with today's LLMs.

* Accepted to NeurIPS 2024 Workshop on Behavioral Machine Learning

Via

Access Paper or Ask Questions

Byte BPE Tokenization as an Inverse string Homomorphism

Dec 04, 2024

Saibo Geng, Sankalp Gambhir, Chris Wendler, Robert West

Figure 1 for Byte BPE Tokenization as an Inverse string Homomorphism

Figure 2 for Byte BPE Tokenization as an Inverse string Homomorphism

Figure 3 for Byte BPE Tokenization as an Inverse string Homomorphism

Figure 4 for Byte BPE Tokenization as an Inverse string Homomorphism

Abstract:Tokenization is an important preprocessing step in the training and inference of large language models (LLMs). While there has been extensive research on the expressive power of the neural achitectures used in LLMs, the impact of tokenization has not been well understood. In this work, we demonstrate that tokenization, irrespective of the algorithm used, acts as an inverse homomorphism between strings and tokens. This suggests that the character space of the source language and the token space of the tokenized language are homomorphic, preserving the structural properties of the source language. Additionally, we explore the concept of proper tokenization, which refers to an unambiguous tokenization returned from the tokenizer. Our analysis reveals that the expressiveness of neural architectures in recognizing context-free languages is not affected by tokenization.

Via

Access Paper or Ask Questions

Separating Tongue from Thought: Activation Patching Reveals Language-Agnostic Concept Representations in Transformers

Nov 13, 2024

Clément Dumas, Chris Wendler, Veniamin Veselovsky, Giovanni Monea, Robert West

Figure 1 for Separating Tongue from Thought: Activation Patching Reveals Language-Agnostic Concept Representations in Transformers

Figure 2 for Separating Tongue from Thought: Activation Patching Reveals Language-Agnostic Concept Representations in Transformers

Figure 3 for Separating Tongue from Thought: Activation Patching Reveals Language-Agnostic Concept Representations in Transformers

Figure 4 for Separating Tongue from Thought: Activation Patching Reveals Language-Agnostic Concept Representations in Transformers

Abstract:A central question in multilingual language modeling is whether large language models (LLMs) develop a universal concept representation, disentangled from specific languages. In this paper, we address this question by analyzing latent representations (latents) during a word translation task in transformer-based LLMs. We strategically extract latents from a source translation prompt and insert them into the forward pass on a target translation prompt. By doing so, we find that the output language is encoded in the latent at an earlier layer than the concept to be translated. Building on this insight, we conduct two key experiments. First, we demonstrate that we can change the concept without changing the language and vice versa through activation patching alone. Second, we show that patching with the mean over latents across different languages does not impair and instead improves the models' performance in translating the concept. Our results provide evidence for the existence of language-agnostic concept representations within the investigated models.

* 12 pages, 10 figures, previously published under the title "How Do Llamas Process Multilingual Text? A Latent Exploration through Activation Patching" at the ICML 2024 mechanistic interpretability workshop https://openreview.net/forum?id=0ku2hIm4BS

Via

Access Paper or Ask Questions

Controllable Context Sensitivity and the Knob Behind It

Nov 11, 2024

Julian Minder, Kevin Du, Niklas Stoehr, Giovanni Monea, Chris Wendler, Robert West, Ryan Cotterell

Figure 1 for Controllable Context Sensitivity and the Knob Behind It

Figure 2 for Controllable Context Sensitivity and the Knob Behind It

Figure 3 for Controllable Context Sensitivity and the Knob Behind It

Figure 4 for Controllable Context Sensitivity and the Knob Behind It

Abstract:When making predictions, a language model must trade off how much it relies on its context vs. its prior knowledge. Choosing how sensitive the model is to its context is a fundamental functionality, as it enables the model to excel at tasks like retrieval-augmented generation and question-answering. In this paper, we search for a knob which controls this sensitivity, determining whether language models answer from the context or their prior knowledge. To guide this search, we design a task for controllable context sensitivity. In this task, we first feed the model a context (Paris is in England) and a question (Where is Paris?); we then instruct the model to either use its prior or contextual knowledge and evaluate whether it generates the correct answer for both intents (either France or England). When fine-tuned on this task, instruction-tuned versions of Llama-3.1, Mistral-v0.3, and Gemma-2 can solve it with high accuracy (85-95%). Analyzing these high-performing models, we narrow down which layers may be important to context sensitivity using a novel linear time algorithm. Then, in each model, we identify a 1-D subspace in a single layer that encodes whether the model follows context or prior knowledge. Interestingly, while we identify this subspace in a fine-tuned model, we find that the exact same subspace serves as an effective knob in not only that model but also non-fine-tuned instruct and base models of that model family. Finally, we show a strong correlation between a model's performance and how distinctly it separates context-agreeing from context-ignoring answers in this subspace. These results suggest a single subspace facilitates how the model chooses between context and prior knowledge, hinting at a simple fundamental mechanism that controls this behavior.

Via

Access Paper or Ask Questions

Magentic-One: A Generalist Multi-Agent System for Solving Complex Tasks

Nov 07, 2024

Adam Fourney, Gagan Bansal, Hussein Mozannar, Cheng Tan, Eduardo Salinas, Erkang, Zhu, Friederike Niedtner, Grace Proebsting, Griffin Bassman(+10 more)

Figure 1 for Magentic-One: A Generalist Multi-Agent System for Solving Complex Tasks

Figure 2 for Magentic-One: A Generalist Multi-Agent System for Solving Complex Tasks

Figure 3 for Magentic-One: A Generalist Multi-Agent System for Solving Complex Tasks

Figure 4 for Magentic-One: A Generalist Multi-Agent System for Solving Complex Tasks

Abstract:Modern AI agents, driven by advances in large foundation models, promise to enhance our productivity and transform our lives by augmenting our knowledge and capabilities. To achieve this vision, AI agents must effectively plan, perform multi-step reasoning and actions, respond to novel observations, and recover from errors, to successfully complete complex tasks across a wide range of scenarios. In this work, we introduce Magentic-One, a high-performing open-source agentic system for solving such tasks. Magentic-One uses a multi-agent architecture where a lead agent, the Orchestrator, plans, tracks progress, and re-plans to recover from errors. Throughout task execution, the Orchestrator directs other specialized agents to perform tasks as needed, such as operating a web browser, navigating local files, or writing and executing Python code. We show that Magentic-One achieves statistically competitive performance to the state-of-the-art on three diverse and challenging agentic benchmarks: GAIA, AssistantBench, and WebArena. Magentic-One achieves these results without modification to core agent capabilities or to how they collaborate, demonstrating progress towards generalist agentic systems. Moreover, Magentic-One's modular design allows agents to be added or removed from the team without additional prompt tuning or training, easing development and making it extensible to future scenarios. We provide an open-source implementation of Magentic-One, and we include AutoGenBench, a standalone tool for agentic evaluation. AutoGenBench provides built-in controls for repetition and isolation to run agentic benchmarks in a rigorous and contained manner -- which is important when agents' actions have side-effects. Magentic-One, AutoGenBench and detailed empirical performance evaluations of Magentic-One, including ablations and error analysis are available at https://aka.ms/magentic-one

Via

Access Paper or Ask Questions

Unpacking SDXL Turbo: Interpreting Text-to-Image Models with Sparse Autoencoders

Oct 28, 2024

Viacheslav Surkov, Chris Wendler, Mikhail Terekhov, Justin Deschenaux, Robert West, Caglar Gulcehre

Figure 1 for Unpacking SDXL Turbo: Interpreting Text-to-Image Models with Sparse Autoencoders

Figure 2 for Unpacking SDXL Turbo: Interpreting Text-to-Image Models with Sparse Autoencoders

Figure 3 for Unpacking SDXL Turbo: Interpreting Text-to-Image Models with Sparse Autoencoders

Figure 4 for Unpacking SDXL Turbo: Interpreting Text-to-Image Models with Sparse Autoencoders

Abstract:Sparse autoencoders (SAEs) have become a core ingredient in the reverse engineering of large-language models (LLMs). For LLMs, they have been shown to decompose intermediate representations that often are not interpretable directly into sparse sums of interpretable features, facilitating better control and subsequent analysis. However, similar analyses and approaches have been lacking for text-to-image models. We investigated the possibility of using SAEs to learn interpretable features for a few-step text-to-image diffusion models, such as SDXL Turbo. To this end, we train SAEs on the updates performed by transformer blocks within SDXL Turbo's denoising U-net. We find that their learned features are interpretable, causally influence the generation process, and reveal specialization among the blocks. In particular, we find one block that deals mainly with image composition, one that is mainly responsible for adding local details, and one for color, illumination, and style. Therefore, our work is an important first step towards better understanding the internals of generative text-to-image models like SDXL Turbo and showcases the potential of features learned by SAEs for the visual domain. Code is available at https://github.com/surkovv/sdxl-unbox

Via

Access Paper or Ask Questions

Activation Scaling for Steering and Interpreting Language Models

Oct 07, 2024

Niklas Stoehr, Kevin Du, Vésteinn Snæbjarnarson, Robert West, Ryan Cotterell, Aaron Schein

Abstract:Given the prompt "Rome is in", can we steer a language model to flip its prediction of an incorrect token "France" to a correct token "Italy" by only multiplying a few relevant activation vectors with scalars? We argue that successfully intervening on a model is a prerequisite for interpreting its internal workings. Concretely, we establish a three-term objective: a successful intervention should flip the correct with the wrong token and vice versa (effectiveness), and leave other tokens unaffected (faithfulness), all while being sparse (minimality). Using gradient-based optimization, this objective lets us learn (and later evaluate) a specific kind of efficient and interpretable intervention: activation scaling only modifies the signed magnitude of activation vectors to strengthen, weaken, or reverse the steering directions already encoded in the model. On synthetic tasks, this intervention performs comparably with steering vectors in terms of effectiveness and faithfulness, but is much more minimal allowing us to pinpoint interpretable model components. We evaluate activation scaling from different angles, compare performance on different datasets, and make activation scalars a learnable function of the activation vectors themselves to generalize to varying-length prompts.

* Findings of the Association for Computational Linguistics: EMNLP 2024

Via

Access Paper or Ask Questions

Entity Insertion in Multilingual Linked Corpora: The Case of Wikipedia

Oct 05, 2024

Tomás Feith, Akhil Arora, Martin Gerlach, Debjit Paul, Robert West

Figure 1 for Entity Insertion in Multilingual Linked Corpora: The Case of Wikipedia

Figure 2 for Entity Insertion in Multilingual Linked Corpora: The Case of Wikipedia

Figure 3 for Entity Insertion in Multilingual Linked Corpora: The Case of Wikipedia

Figure 4 for Entity Insertion in Multilingual Linked Corpora: The Case of Wikipedia

Abstract:Links are a fundamental part of information networks, turning isolated pieces of knowledge into a network of information that is much richer than the sum of its parts. However, adding a new link to the network is not trivial: it requires not only the identification of a suitable pair of source and target entities but also the understanding of the content of the source to locate a suitable position for the link in the text. The latter problem has not been addressed effectively, particularly in the absence of text spans in the source that could serve as anchors to insert a link to the target entity. To bridge this gap, we introduce and operationalize the task of entity insertion in information networks. Focusing on the case of Wikipedia, we empirically show that this problem is, both, relevant and challenging for editors. We compile a benchmark dataset in 105 languages and develop a framework for entity insertion called LocEI (Localized Entity Insertion) and its multilingual variant XLocEI. We show that XLocEI outperforms all baseline models (including state-of-the-art prompt-based ranking with LLMs such as GPT-4) and that it can be applied in a zero-shot manner on languages not seen during training with minimal performance drop. These findings are important for applying entity insertion models in practice, e.g., to support editors in adding links across the more than 300 language versions of Wikipedia.

* EMNLP 2024; 24 pages; 62 figures

Via

Access Paper or Ask Questions