Alert button
Picture for John V. Guttag

John V. Guttag

Alert button

Beyond Faithfulness: A Framework to Characterize and Compare Saliency Methods

Jun 07, 2022
Angie Boggust, Harini Suresh, Hendrik Strobelt, John V. Guttag, Arvind Satyanarayan

Figure 1 for Beyond Faithfulness: A Framework to Characterize and Compare Saliency Methods
Figure 2 for Beyond Faithfulness: A Framework to Characterize and Compare Saliency Methods
Figure 3 for Beyond Faithfulness: A Framework to Characterize and Compare Saliency Methods
Figure 4 for Beyond Faithfulness: A Framework to Characterize and Compare Saliency Methods

Saliency methods calculate how important each input feature is to a machine learning model's prediction, and are commonly used to understand model reasoning. "Faithfulness", or how fully and accurately the saliency output reflects the underlying model, is an oft-cited desideratum for these methods. However, explanation methods must necessarily sacrifice certain information in service of user-oriented goals such as simplicity. To that end, and akin to performance metrics, we frame saliency methods as abstractions: individual tools that provide insight into specific aspects of model behavior and entail tradeoffs. Using this framing, we describe a framework of nine dimensions to characterize and compare the properties of saliency methods. We group these dimensions into three categories that map to different phases of the interpretation process: methodology, or how the saliency is calculated; sensitivity, or relationships between the saliency result and the underlying model or input; and, perceptibility, or how a user interprets the result. As we show, these dimensions give us a granular vocabulary for describing and comparing saliency methods -- for instance, allowing us to develop "saliency cards" as a form of documentation, or helping downstream users understand tradeoffs and choose a method for a particular use case. Moreover, by situating existing saliency methods within this framework, we identify opportunities for future work, including filling gaps in the landscape and developing new evaluation metrics.

* 13 pages, 5 figures, 2 tables 
Viaarxiv icon

Intuitively Assessing ML Model Reliability through Example-Based Explanations and Editing Model Inputs

Feb 17, 2021
Harini Suresh, Kathleen M. Lewis, John V. Guttag, Arvind Satyanarayan

Figure 1 for Intuitively Assessing ML Model Reliability through Example-Based Explanations and Editing Model Inputs
Figure 2 for Intuitively Assessing ML Model Reliability through Example-Based Explanations and Editing Model Inputs
Figure 3 for Intuitively Assessing ML Model Reliability through Example-Based Explanations and Editing Model Inputs
Figure 4 for Intuitively Assessing ML Model Reliability through Example-Based Explanations and Editing Model Inputs

Interpretability methods aim to help users build trust in and understand the capabilities of machine learning models. However, existing approaches often rely on abstract, complex visualizations that poorly map to the task at hand or require non-trivial ML expertise to interpret. Here, we present two interface modules to facilitate a more intuitive assessment of model reliability. To help users better characterize and reason about a model's uncertainty, we visualize raw and aggregate information about a given input's nearest neighbors in the training dataset. Using an interactive editor, users can manipulate this input in semantically-meaningful ways, determine the effect on the output, and compare against their prior expectations. We evaluate our interface using an electrocardiogram beat classification case study. Compared to a baseline feature importance interface, we find that 9 physicians are better able to align the model's uncertainty with clinically relevant factors and build intuition about its capabilities and limitations.

Viaarxiv icon

Painting Many Pasts: Synthesizing Time Lapse Videos of Paintings

Jan 04, 2020
Amy Zhao, Guha Balakrishnan, Kathleen M. Lewis, Frédo Durand, John V. Guttag, Adrian V. Dalca

Figure 1 for Painting Many Pasts: Synthesizing Time Lapse Videos of Paintings
Figure 2 for Painting Many Pasts: Synthesizing Time Lapse Videos of Paintings
Figure 3 for Painting Many Pasts: Synthesizing Time Lapse Videos of Paintings
Figure 4 for Painting Many Pasts: Synthesizing Time Lapse Videos of Paintings

We introduce a new video synthesis task: synthesizing time lapse videos depicting how a given painting might have been created. Artists paint using unique combinations of brushes, strokes, colors, and layers. There are often many possible ways to create a given painting. Our goal is to learn to capture this rich range of possibilities. Creating distributions of long-term videos is a challenge for learning-based video synthesis methods. We present a probabilistic model that, given a single image of a completed painting, recurrently synthesizes steps of the painting process. We implement this model as a convolutional neural network, and introduce a training scheme to facilitate learning from a limited dataset of painting time lapses. We demonstrate that this model can be used to sample many time steps, enabling long-term stochastic video synthesis. We evaluate our method on digital and watercolor paintings collected from video websites, and show that human raters find our synthesized videos to be similar to time lapses produced by real artists.

Viaarxiv icon

Visual Deprojection: Probabilistic Recovery of Collapsed Dimensions

Sep 01, 2019
Guha Balakrishnan, Adrian V. Dalca, Amy Zhao, John V. Guttag, Fredo Durand, William T. Freeman

Figure 1 for Visual Deprojection: Probabilistic Recovery of Collapsed Dimensions
Figure 2 for Visual Deprojection: Probabilistic Recovery of Collapsed Dimensions
Figure 3 for Visual Deprojection: Probabilistic Recovery of Collapsed Dimensions
Figure 4 for Visual Deprojection: Probabilistic Recovery of Collapsed Dimensions

We introduce visual deprojection: the task of recovering an image or video that has been collapsed along a dimension. Projections arise in various contexts, such as long-exposure photography, where a dynamic scene is collapsed in time to produce a motion-blurred image, and corner cameras, where reflected light from a scene is collapsed along a spatial dimension because of an edge occluder to yield a 1D video. Deprojection is ill-posed-- often there are many plausible solutions for a given input. We first propose a probabilistic model capturing the ambiguity of the task. We then present a variational inference strategy using convolutional neural networks as functional approximators. Sampling from the inference network at test time yields plausible candidates from the distribution of original signals that are consistent with a given input projection. We evaluate the method on several datasets for both spatial and temporal deprojection tasks. We first demonstrate the method can recover human gait videos and face images from spatial projections, and then show that it can recover videos of moving digits from dramatically motion-blurred images obtained via temporal projection.

* ICCV 2019 
Viaarxiv icon

Data augmentation using learned transformations for one-shot medical image segmentation

Apr 06, 2019
Amy Zhao, Guha Balakrishnan, Frédo Durand, John V. Guttag, Adrian V. Dalca

Figure 1 for Data augmentation using learned transformations for one-shot medical image segmentation
Figure 2 for Data augmentation using learned transformations for one-shot medical image segmentation
Figure 3 for Data augmentation using learned transformations for one-shot medical image segmentation
Figure 4 for Data augmentation using learned transformations for one-shot medical image segmentation

Image segmentation is an important task in many medical applications. Methods based on convolutional neural networks attain state-of-the-art accuracy; however, they typically rely on supervised training with large labeled datasets. Labeling medical images requires significant expertise and time, and typical hand-tuned approaches for data augmentation fail to capture the complex variations in such images. We present an automated data augmentation method for synthesizing labeled medical images. We demonstrate our method on the task of segmenting magnetic resonance imaging (MRI) brain scans. Our method requires only a single segmented scan, and leverages other unlabeled scans in a semi-supervised approach. We learn a model of transformations from the images, and use the model along with the labeled example to synthesize additional labeled examples. Each transformation is comprised of a spatial deformation field and an intensity change, enabling the synthesis of complex effects such as variations in anatomy and image acquisition procedures. We show that training a supervised segmenter with these new examples provides significant improvements over state-of-the-art methods for one-shot biomedical image segmentation. Our code is available at https://github.com/xamyzhao/brainstorm.

* 9 pages, CVPR 2019 
Viaarxiv icon

A Framework for Understanding Unintended Consequences of Machine Learning

Jan 28, 2019
Harini Suresh, John V. Guttag

Figure 1 for A Framework for Understanding Unintended Consequences of Machine Learning
Figure 2 for A Framework for Understanding Unintended Consequences of Machine Learning

As machine learning increasingly affects people and society, it is important that we strive for a comprehensive and unified understanding of how and why unwanted consequences arise. For instance, downstream harms to particular groups are often blamed on "biased data," but this concept encompass too many issues to be useful in developing solutions. In this paper, we provide a framework that partitions sources of downstream harm in machine learning into five distinct categories spanning the data generation and machine learning pipeline. We describe how these issues arise, how they are relevant to particular applications, and how they motivate different solutions. In doing so, we aim to facilitate the development of solutions that stem from an understanding of application-specific populations and data generation processes, rather than relying on general claims about what may or may not be "fair."

* 6 pages, 2 figures 
Viaarxiv icon

EXTRACT: Strong Examples from Weakly-Labeled Sensor Data

Sep 29, 2016
Davis W. Blalock, John V. Guttag

Figure 1 for EXTRACT: Strong Examples from Weakly-Labeled Sensor Data
Figure 2 for EXTRACT: Strong Examples from Weakly-Labeled Sensor Data
Figure 3 for EXTRACT: Strong Examples from Weakly-Labeled Sensor Data
Figure 4 for EXTRACT: Strong Examples from Weakly-Labeled Sensor Data

Thanks to the rise of wearable and connected devices, sensor-generated time series comprise a large and growing fraction of the world's data. Unfortunately, extracting value from this data can be challenging, since sensors report low-level signals (e.g., acceleration), not the high-level events that are typically of interest (e.g., gestures). We introduce a technique to bridge this gap by automatically extracting examples of real-world events in low-level data, given only a rough estimate of when these events have taken place. By identifying sets of features that repeat in the same temporal arrangement, we isolate examples of such diverse events as human actions, power consumption patterns, and spoken words with up to 96% precision and recall. Our method is fast enough to run in real time and assumes only minimal knowledge of which variables are relevant or the lengths of events. Our evaluation uses numerous publicly available datasets and over 1 million samples of manually labeled sensor data.

* To appear in IEEE International Conference on Data Mining 2016 
Viaarxiv icon

Uncovering Voice Misuse Using Symbolic Mismatch

Aug 08, 2016
Marzyeh Ghassemi, Zeeshan Syed, Daryush D. Mehta, Jarrad H. Van Stan, Robert E. Hillman, John V. Guttag

Figure 1 for Uncovering Voice Misuse Using Symbolic Mismatch
Figure 2 for Uncovering Voice Misuse Using Symbolic Mismatch
Figure 3 for Uncovering Voice Misuse Using Symbolic Mismatch
Figure 4 for Uncovering Voice Misuse Using Symbolic Mismatch

Voice disorders affect an estimated 14 million working-aged Americans, and many more worldwide. We present the first large scale study of vocal misuse based on long-term ambulatory data collected by an accelerometer placed on the neck. We investigate an unsupervised data mining approach to uncovering latent information about voice misuse. We segment signals from over 253 days of data from 22 subjects into over a hundred million single glottal pulses (closures of the vocal folds), cluster segments into symbols, and use symbolic mismatch to uncover differences between patients and matched controls, and between patients pre- and post-treatment. Our results show significant behavioral differences between patients and controls, as well as between some pre- and post-treatment patients. Our proposed approach provides an objective basis for helping diagnose behavioral voice disorders, and is a first step towards a more data-driven understanding of the impact of voice therapy.

* Presented at 2016 Machine Learning and Healthcare Conference (MLHC 2016), Los Angeles, CA 
Viaarxiv icon

Transferring Knowledge from Text to Predict Disease Onset

Aug 06, 2016
Yun Liu, Kun-Ta Chuang, Fu-Wen Liang, Huey-Jen Su, Collin M. Stultz, John V. Guttag

Figure 1 for Transferring Knowledge from Text to Predict Disease Onset
Figure 2 for Transferring Knowledge from Text to Predict Disease Onset
Figure 3 for Transferring Knowledge from Text to Predict Disease Onset
Figure 4 for Transferring Knowledge from Text to Predict Disease Onset

In many domains such as medicine, training data is in short supply. In such cases, external knowledge is often helpful in building predictive models. We propose a novel method to incorporate publicly available domain expertise to build accurate models. Specifically, we use word2vec models trained on a domain-specific corpus to estimate the relevance of each feature's text description to the prediction problem. We use these relevance estimates to rescale the features, causing more important features to experience weaker regularization. We apply our method to predict the onset of five chronic diseases in the next five years in two genders and two age groups. Our rescaling approach improves the accuracy of the model, particularly when there are few positive examples. Furthermore, our method selects 60% fewer features, easing interpretation by physicians. Our method is applicable to other domains where feature and outcome descriptions are available.

* Presented at 2016 Machine Learning and Healthcare Conference (MLHC 2016), Los Angeles, CA 
Viaarxiv icon