Alert button
Picture for Fernanda Viégas

Fernanda Viégas

Alert button

Beyond Surface Statistics: Scene Representations in a Latent Diffusion Model

Jun 09, 2023
Yida Chen, Fernanda Viégas, Martin Wattenberg

Figure 1 for Beyond Surface Statistics: Scene Representations in a Latent Diffusion Model
Figure 2 for Beyond Surface Statistics: Scene Representations in a Latent Diffusion Model
Figure 3 for Beyond Surface Statistics: Scene Representations in a Latent Diffusion Model
Figure 4 for Beyond Surface Statistics: Scene Representations in a Latent Diffusion Model

Latent diffusion models (LDMs) exhibit an impressive ability to produce realistic images, yet the inner workings of these models remain mysterious. Even when trained purely on images without explicit depth information, they typically output coherent pictures of 3D scenes. In this work, we investigate a basic interpretability question: does an LDM create and use an internal representation of simple scene geometry? Using linear probes, we find evidence that the internal activations of the LDM encode linear representations of both 3D depth data and a salient-object / background distinction. These representations appear surprisingly early in the denoising process$-$well before a human can easily make sense of the noisy images. Intervention experiments further indicate these representations play a causal role in image synthesis, and may be used for simple high-level editing of an LDM's output.

* 17 pages, 13 figures 
Viaarxiv icon

Inference-Time Intervention: Eliciting Truthful Answers from a Language Model

Jun 07, 2023
Kenneth Li, Oam Patel, Fernanda Viégas, Hanspeter Pfister, Martin Wattenberg

Figure 1 for Inference-Time Intervention: Eliciting Truthful Answers from a Language Model
Figure 2 for Inference-Time Intervention: Eliciting Truthful Answers from a Language Model
Figure 3 for Inference-Time Intervention: Eliciting Truthful Answers from a Language Model
Figure 4 for Inference-Time Intervention: Eliciting Truthful Answers from a Language Model

We introduce Inference-Time Intervention (ITI), a technique designed to enhance the truthfulness of large language models (LLMs). ITI operates by shifting model activations during inference, following a set of directions across a limited number of attention heads. This intervention significantly improves the performance of LLaMA models on the TruthfulQA benchmark. On an instruction-finetuned LLaMA called Alpaca, ITI improves its truthfulness from 32.5% to 65.1%. We identify a tradeoff between truthfulness and helpfulness and demonstrate how to balance it by tuning the intervention strength. ITI is minimally invasive and computationally inexpensive. Moreover, the technique is data efficient: while approaches like RLHF require extensive annotations, ITI locates truthful directions using only few hundred examples. Our findings suggest that LLMs may have an internal representation of the likelihood of something being true, even as they produce falsehoods on the surface.

* code: https://github.com/likenneth/honest_llama 
Viaarxiv icon

AttentionViz: A Global View of Transformer Attention

May 04, 2023
Catherine Yeh, Yida Chen, Aoyu Wu, Cynthia Chen, Fernanda Viégas, Martin Wattenberg

Figure 1 for AttentionViz: A Global View of Transformer Attention
Figure 2 for AttentionViz: A Global View of Transformer Attention
Figure 3 for AttentionViz: A Global View of Transformer Attention
Figure 4 for AttentionViz: A Global View of Transformer Attention

Transformer models are revolutionizing machine learning, but their inner workings remain mysterious. In this work, we present a new visualization technique designed to help researchers understand the self-attention mechanism in transformers that allows these models to learn rich, contextual relationships between elements of a sequence. The main idea behind our method is to visualize a joint embedding of the query and key vectors used by transformer models to compute attention. Unlike previous attention visualization techniques, our approach enables the analysis of global patterns across multiple input sequences. We create an interactive visualization tool, AttentionViz, based on these joint query-key embeddings, and use it to study attention mechanisms in both language and vision transformers. We demonstrate the utility of our approach in improving model understanding and offering new insights about query-key interactions through several application scenarios and expert feedback.

* 11 pages, 13 figures 
Viaarxiv icon

The System Model and the User Model: Exploring AI Dashboard Design

May 04, 2023
Fernanda Viégas, Martin Wattenberg

Figure 1 for The System Model and the User Model: Exploring AI Dashboard Design
Figure 2 for The System Model and the User Model: Exploring AI Dashboard Design

This is a speculative essay on interface design and artificial intelligence. Recently there has been a surge of attention to chatbots based on large language models, including widely reported unsavory interactions. We contend that part of the problem is that text is not all you need: sophisticated AI systems should have dashboards, just like all other complicated devices. Assuming the hypothesis that AI systems based on neural networks will contain interpretable models of aspects of the world around them, we discuss what data such dashboards might display. We conjecture that, for many systems, the two most important models will be of the user and of the system itself. We call these the System Model and User Model. We argue that, for usability and safety, interfaces to dialogue-based AI systems should have a parallel display based on the state of the System Model and the User Model. Finding ways to identify, interpret, and display these two models should be a core part of interface research for AI.

* 10 pages, 2 figures 
Viaarxiv icon

Emergent World Representations: Exploring a Sequence Model Trained on a Synthetic Task

Oct 25, 2022
Kenneth Li, Aspen K. Hopkins, David Bau, Fernanda Viégas, Hanspeter Pfister, Martin Wattenberg

Figure 1 for Emergent World Representations: Exploring a Sequence Model Trained on a Synthetic Task
Figure 2 for Emergent World Representations: Exploring a Sequence Model Trained on a Synthetic Task
Figure 3 for Emergent World Representations: Exploring a Sequence Model Trained on a Synthetic Task
Figure 4 for Emergent World Representations: Exploring a Sequence Model Trained on a Synthetic Task

Language models show a surprising range of capabilities, but the source of their apparent competence is unclear. Do these networks just memorize a collection of surface statistics, or do they rely on internal representations of the process that generates the sequences they see? We investigate this question by applying a variant of the GPT model to the task of predicting legal moves in a simple board game, Othello. Although the network has no a priori knowledge of the game or its rules, we uncover evidence of an emergent nonlinear internal representation of the board state. Interventional experiments indicate this representation can be used to control the output of the network and create "latent saliency maps" that can help explain predictions in human terms.

* code: https://github.com/likenneth/othello_world 
Viaarxiv icon

An Interpretability Illusion for BERT

Apr 14, 2021
Tolga Bolukbasi, Adam Pearce, Ann Yuan, Andy Coenen, Emily Reif, Fernanda Viégas, Martin Wattenberg

Figure 1 for An Interpretability Illusion for BERT
Figure 2 for An Interpretability Illusion for BERT
Figure 3 for An Interpretability Illusion for BERT
Figure 4 for An Interpretability Illusion for BERT

We describe an "interpretability illusion" that arises when analyzing the BERT model. Activations of individual neurons in the network may spuriously appear to encode a single, simple concept, when in fact they are encoding something far more complex. The same effect holds for linear combinations of activations. We trace the source of this illusion to geometric properties of BERT's embedding space as well as the fact that common text corpora represent only narrow slices of possible English sentences. We provide a taxonomy of model-learned concepts and discuss methodological implications for interpretability research, especially the importance of testing hypotheses on multiple data sets.

Viaarxiv icon

Segment Integrated Gradients: Better attributions through regions

Jun 06, 2019
Andrei Kapishnikov, Tolga Bolukbasi, Fernanda Viégas, Michael Terry

Figure 1 for Segment Integrated Gradients: Better attributions through regions
Figure 2 for Segment Integrated Gradients: Better attributions through regions
Figure 3 for Segment Integrated Gradients: Better attributions through regions
Figure 4 for Segment Integrated Gradients: Better attributions through regions

Saliency methods can aid understanding of deep neural networks. Recent years have witnessed many improvements to saliency methods, as well as new ways for evaluating them. In this paper, we 1) present a novel region-based attribution method, Segment-Integrated Gradients (SIG), that builds upon integrated gradients (Sundararajan et al. 2017), 2) introduce evaluation methods for empirically assessing the quality of image-based saliency maps (Performance Information Curves (PICs)), and 3) contribute an axiom-based sanity check for attribution methods. Through empirical experiments and example results, we show that SIG produces better results than other saliency methods for common models and the ImageNet dataset.

Viaarxiv icon

Visualizing and Measuring the Geometry of BERT

Jun 06, 2019
Andy Coenen, Emily Reif, Ann Yuan, Been Kim, Adam Pearce, Fernanda Viégas, Martin Wattenberg

Figure 1 for Visualizing and Measuring the Geometry of BERT
Figure 2 for Visualizing and Measuring the Geometry of BERT
Figure 3 for Visualizing and Measuring the Geometry of BERT
Figure 4 for Visualizing and Measuring the Geometry of BERT

Transformer architectures show significant promise for natural language processing. Given that a single pretrained model can be fine-tuned to perform well on many different tasks, these networks appear to extract generally useful linguistic features. A natural question is how such networks represent this information internally. This paper describes qualitative and quantitative investigations of one particularly effective model, BERT. At a high level, linguistic features seem to be represented in separate semantic and syntactic subspaces. We find evidence of a fine-grained geometric representation of word senses. We also present empirical descriptions of syntactic representations in both attention matrices and individual word embeddings, as well as a mathematical argument to explain the geometry of these representations.

* 8 pages, 5 figures 
Viaarxiv icon

GAN Lab: Understanding Complex Deep Generative Models using Interactive Visual Experimentation

Sep 05, 2018
Minsuk Kahng, Nikhil Thorat, Duen Horng Chau, Fernanda Viégas, Martin Wattenberg

Figure 1 for GAN Lab: Understanding Complex Deep Generative Models using Interactive Visual Experimentation
Figure 2 for GAN Lab: Understanding Complex Deep Generative Models using Interactive Visual Experimentation
Figure 3 for GAN Lab: Understanding Complex Deep Generative Models using Interactive Visual Experimentation
Figure 4 for GAN Lab: Understanding Complex Deep Generative Models using Interactive Visual Experimentation

Recent success in deep learning has generated immense interest among practitioners and students, inspiring many to learn about this new technology. While visual and interactive approaches have been successfully developed to help people more easily learn deep learning, most existing tools focus on simpler models. In this work, we present GAN Lab, the first interactive visualization tool designed for non-experts to learn and experiment with Generative Adversarial Networks (GANs), a popular class of complex deep learning models. With GAN Lab, users can interactively train generative models and visualize the dynamic training process's intermediate results. GAN Lab tightly integrates an model overview graph that summarizes GAN's structure, and a layered distributions view that helps users interpret the interplay between submodels. GAN Lab introduces new interactive experimentation features for learning complex deep learning models, such as step-by-step training at multiple levels of abstraction for understanding intricate training dynamics. Implemented using TensorFlow.js, GAN Lab is accessible to anyone via modern web browsers, without the need for installation or specialized hardware, overcoming a major practical challenge in deploying interactive tools for deep learning.

* This paper will be published in the IEEE Transactions on Visualization and Computer Graphics, 25(1), January 2019, and presented at IEEE VAST 2018 
Viaarxiv icon

Google's Multilingual Neural Machine Translation System: Enabling Zero-Shot Translation

Aug 21, 2017
Melvin Johnson, Mike Schuster, Quoc V. Le, Maxim Krikun, Yonghui Wu, Zhifeng Chen, Nikhil Thorat, Fernanda Viégas, Martin Wattenberg, Greg Corrado, Macduff Hughes, Jeffrey Dean

We propose a simple solution to use a single Neural Machine Translation (NMT) model to translate between multiple languages. Our solution requires no change in the model architecture from our base system but instead introduces an artificial token at the beginning of the input sentence to specify the required target language. The rest of the model, which includes encoder, decoder and attention, remains unchanged and is shared across all languages. Using a shared wordpiece vocabulary, our approach enables Multilingual NMT using a single model without any increase in parameters, which is significantly simpler than previous proposals for Multilingual NMT. Our method often improves the translation quality of all involved language pairs, even while keeping the total number of model parameters constant. On the WMT'14 benchmarks, a single multilingual model achieves comparable performance for English$\rightarrow$French and surpasses state-of-the-art results for English$\rightarrow$German. Similarly, a single multilingual model surpasses state-of-the-art results for French$\rightarrow$English and German$\rightarrow$English on WMT'14 and WMT'15 benchmarks respectively. On production corpora, multilingual models of up to twelve language pairs allow for better translation of many individual pairs. In addition to improving the translation quality of language pairs that the model was trained with, our models can also learn to perform implicit bridging between language pairs never seen explicitly during training, showing that transfer learning and zero-shot translation is possible for neural translation. Finally, we show analyses that hints at a universal interlingua representation in our models and show some interesting examples when mixing languages.

Viaarxiv icon