Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Margaret Mitchell

Mitigating Unwanted Biases with Adversarial Learning

Jan 22, 2018

Brian Hu Zhang, Blake Lemoine, Margaret Mitchell

Figure 1 for Mitigating Unwanted Biases with Adversarial Learning

Figure 2 for Mitigating Unwanted Biases with Adversarial Learning

Figure 3 for Mitigating Unwanted Biases with Adversarial Learning

Figure 4 for Mitigating Unwanted Biases with Adversarial Learning

Abstract:Machine learning is a tool for building models that accurately represent input training data. When undesired biases concerning demographic groups are in the training data, well-trained models will reflect those biases. We present a framework for mitigating such biases by including a variable for the group of interest and simultaneously learning a predictor and an adversary. The input to the network X, here text or census data, produces a prediction Y, such as an analogy completion or income bracket, while the adversary tries to model a protected variable Z, here gender or zip code. The objective is to maximize the predictor's ability to predict Y while minimizing the adversary's ability to predict Z. Applied to analogy completion, this method results in accurate predictions that exhibit less evidence of stereotyping Z. When applied to a classification task using the UCI Adult (Census) Dataset, it results in a predictive model that does not lose much accuracy while achieving very close to equality of odds (Hardt, et al., 2016). The method is flexible and applicable to multiple definitions of fairness as well as a wide range of gradient-based learning models, including both regression and classification tasks.

Via

Access Paper or Ask Questions

Multi-Task Learning for Mental Health using Social Media Text

Dec 10, 2017

Adrian Benton, Margaret Mitchell, Dirk Hovy

Figure 1 for Multi-Task Learning for Mental Health using Social Media Text

Figure 2 for Multi-Task Learning for Mental Health using Social Media Text

Figure 3 for Multi-Task Learning for Mental Health using Social Media Text

Figure 4 for Multi-Task Learning for Mental Health using Social Media Text

Abstract:We introduce initial groundwork for estimating suicide risk and mental health in a deep learning framework. By modeling multiple conditions, the system learns to make predictions about suicide risk and mental health at a low false positive rate. Conditions are modeled as tasks in a multi-task learning (MTL) framework, with gender prediction as an additional auxiliary task. We demonstrate the effectiveness of multi-task learning by comparison to a well-tuned single-task baseline with the same number of parameters. Our best MTL model predicts potential suicide attempt, as well as the presence of atypical mental health, with AUC > 0.8. We also find additional large improvements using multi-task learning on mental health tasks with limited training data.

* Proceedings of the 15th Conference of the EACL (2017) 152-162

Via

Access Paper or Ask Questions

Memory-augmented Attention Modelling for Videos

Apr 24, 2017

Rasool Fakoor, Abdel-rahman Mohamed, Margaret Mitchell, Sing Bing Kang, Pushmeet Kohli

Figure 1 for Memory-augmented Attention Modelling for Videos

Figure 2 for Memory-augmented Attention Modelling for Videos

Figure 3 for Memory-augmented Attention Modelling for Videos

Figure 4 for Memory-augmented Attention Modelling for Videos

Abstract:We present a method to improve video description generation by modeling higher-order interactions between video frames and described concepts. By storing past visual attention in the video associated to previously generated words, the system is able to decide what to look at and describe in light of what it has already looked at and described. This enables not only more effective local attention, but tractable consideration of the video sequence while generating each word. Evaluation on the challenging and popular MSVD and Charades datasets demonstrates that the proposed architecture outperforms previous video description approaches without requiring external temporal video features.

* Revised version, minor changes, add the link for the source codes

Via

Access Paper or Ask Questions

VQA: Visual Question Answering

Oct 27, 2016

Aishwarya Agrawal, Jiasen Lu, Stanislaw Antol, Margaret Mitchell, C. Lawrence Zitnick, Dhruv Batra, Devi Parikh

Figure 1 for VQA: Visual Question Answering

Figure 2 for VQA: Visual Question Answering

Figure 3 for VQA: Visual Question Answering

Figure 4 for VQA: Visual Question Answering

Abstract:We propose the task of free-form and open-ended Visual Question Answering (VQA). Given an image and a natural language question about the image, the task is to provide an accurate natural language answer. Mirroring real-world scenarios, such as helping the visually impaired, both the questions and answers are open-ended. Visual questions selectively target different areas of an image, including background details and underlying context. As a result, a system that succeeds at VQA typically needs a more detailed understanding of the image and complex reasoning than a system producing generic image captions. Moreover, VQA is amenable to automatic evaluation, since many open-ended answers contain only a few words or a closed set of answers that can be provided in a multiple-choice format. We provide a dataset containing ~0.25M images, ~0.76M questions, and ~10M answers (www.visualqa.org), and discuss the information it provides. Numerous baselines and methods for VQA are provided and compared with human performance. Our VQA demo is available on CloudCV (http://cloudcv.org/vqa).

* The first three authors contributed equally. International Conference on Computer Vision (ICCV) 2015

Via

Access Paper or Ask Questions

Measuring Machine Intelligence Through Visual Question Answering

Aug 31, 2016

C. Lawrence Zitnick, Aishwarya Agrawal, Stanislaw Antol, Margaret Mitchell, Dhruv Batra, Devi Parikh

Figure 1 for Measuring Machine Intelligence Through Visual Question Answering

Figure 2 for Measuring Machine Intelligence Through Visual Question Answering

Figure 3 for Measuring Machine Intelligence Through Visual Question Answering

Figure 4 for Measuring Machine Intelligence Through Visual Question Answering

Abstract:As machines have become more intelligent, there has been a renewed interest in methods for measuring their intelligence. A common approach is to propose tasks for which a human excels, but one which machines find difficult. However, an ideal task should also be easy to evaluate and not be easily gameable. We begin with a case study exploring the recently popular task of image captioning and its limitations as a task for measuring machine intelligence. An alternative and more promising task is Visual Question Answering that tests a machine's ability to reason about language and vision. We describe a dataset unprecedented in size created for the task that contains over 760,000 human generated questions about images. Using around 10 million human generated answers, machines may be easily evaluated.

* AI Magazine, 2016

Via

Access Paper or Ask Questions

Generating Natural Questions About an Image

Jun 09, 2016

Nasrin Mostafazadeh, Ishan Misra, Jacob Devlin, Margaret Mitchell, Xiaodong He, Lucy Vanderwende

Figure 1 for Generating Natural Questions About an Image

Figure 2 for Generating Natural Questions About an Image

Figure 3 for Generating Natural Questions About an Image

Figure 4 for Generating Natural Questions About an Image

Abstract:There has been an explosion of work in the vision & language community during the past few years from image captioning to video transcription, and answering questions about images. These tasks have focused on literal descriptions of the image. To move beyond the literal, we choose to explore how questions about an image are often directed at commonsense inference and the abstract events evoked by objects in the image. In this paper, we introduce the novel task of Visual Question Generation (VQG), where the system is tasked with asking a natural and engaging question when shown an image. We provide three datasets which cover a variety of images from object-centric to event-centric, with considerably more abstract training data than provided to state-of-the-art captioning systems thus far. We train and test several generative and retrieval models to tackle the task of VQG. Evaluation results show that while such models ask reasonable questions for a variety of images, there is still a wide gap with human performance which motivates further work on connecting images with commonsense knowledge and pragmatics. Our proposed task offers a new challenge to the community which we hope furthers interest in exploring deeper connections between vision & language.

* Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics

Via

Access Paper or Ask Questions

Visual Storytelling

Apr 13, 2016

Ting-Hao, Huang, Francis Ferraro, Nasrin Mostafazadeh, Ishan Misra, Aishwarya Agrawal, Jacob Devlin, Ross Girshick, Xiaodong He, Pushmeet Kohli(+6 more)

Abstract:We introduce the first dataset for sequential vision-to-language, and explore how this data may be used for the task of visual storytelling. The first release of this dataset, SIND v.1, includes 81,743 unique photos in 20,211 sequences, aligned to both descriptive (caption) and story language. We establish several strong baselines for the storytelling task, and motivate an automatic metric to benchmark progress. Modelling concrete description as well as figurative and social language, as provided in this dataset and the storytelling task, has the potential to move artificial intelligence from basic understandings of typical visual scenes towards more and more human-like understanding of grounded event structure and subjective expression.

* to appear in NAACL 2016

Via

Access Paper or Ask Questions

Seeing through the Human Reporting Bias: Visual Classifiers from Noisy Human-Centric Labels

Apr 12, 2016

Ishan Misra, C. Lawrence Zitnick, Margaret Mitchell, Ross Girshick

Figure 1 for Seeing through the Human Reporting Bias: Visual Classifiers from Noisy Human-Centric Labels

Figure 2 for Seeing through the Human Reporting Bias: Visual Classifiers from Noisy Human-Centric Labels

Figure 3 for Seeing through the Human Reporting Bias: Visual Classifiers from Noisy Human-Centric Labels

Figure 4 for Seeing through the Human Reporting Bias: Visual Classifiers from Noisy Human-Centric Labels

Abstract:When human annotators are given a choice about what to label in an image, they apply their own subjective judgments on what to ignore and what to mention. We refer to these noisy "human-centric" annotations as exhibiting human reporting bias. Examples of such annotations include image tags and keywords found on photo sharing sites, or in datasets containing image captions. In this paper, we use these noisy annotations for learning visually correct image classifiers. Such annotations do not use consistent vocabulary, and miss a significant amount of the information present in an image; however, we demonstrate that the noise in these annotations exhibits structure and can be modeled. We propose an algorithm to decouple the human reporting bias from the correct visually grounded labels. Our results are highly interpretable for reporting "what's in the image" versus "what's worth saying." We demonstrate the algorithm's efficacy along a variety of metrics and datasets, including MS COCO and Yahoo Flickr 100M. We show significant improvements over traditional algorithms for both image classification and image captioning, doubling the performance of existing methods in some cases.

* To appear in CVPR 2016

Via

Access Paper or Ask Questions

Language Models for Image Captioning: The Quirks and What Works

Oct 14, 2015

Jacob Devlin, Hao Cheng, Hao Fang, Saurabh Gupta, Li Deng, Xiaodong He, Geoffrey Zweig, Margaret Mitchell

Figure 1 for Language Models for Image Captioning: The Quirks and What Works

Figure 2 for Language Models for Image Captioning: The Quirks and What Works

Figure 3 for Language Models for Image Captioning: The Quirks and What Works

Figure 4 for Language Models for Image Captioning: The Quirks and What Works

Abstract:Two recent approaches have achieved state-of-the-art results in image captioning. The first uses a pipelined process where a set of candidate words is generated by a convolutional neural network (CNN) trained on images, and then a maximum entropy (ME) language model is used to arrange these words into a coherent sentence. The second uses the penultimate activation layer of the CNN as input to a recurrent neural network (RNN) that then generates the caption sequence. In this paper, we compare the merits of these different language modeling approaches for the first time by using the same state-of-the-art CNN as input. We examine issues in the different approaches, including linguistic irregularities, caption repetition, and data set overlap. By combining key aspects of the ME and RNN methods, we achieve a new record performance over previously published results on the benchmark COCO dataset. However, the gains we see in BLEU do not translate to human judgments.

* See http://research.microsoft.com/en-us/projects/image_captioning for project information

Via

Access Paper or Ask Questions

A Survey of Current Datasets for Vision and Language Research

Aug 19, 2015

Francis Ferraro, Nasrin Mostafazadeh, Ting-Hao, Huang, Lucy Vanderwende, Jacob Devlin, Michel Galley, Margaret Mitchell

Figure 1 for A Survey of Current Datasets for Vision and Language Research

Figure 2 for A Survey of Current Datasets for Vision and Language Research

Figure 3 for A Survey of Current Datasets for Vision and Language Research

Abstract:Integrating vision and language has long been a dream in work on artificial intelligence (AI). In the past two years, we have witnessed an explosion of work that brings together vision and language from images to videos and beyond. The available corpora have played a crucial role in advancing this area of research. In this paper, we propose a set of quality metrics for evaluating and analyzing the vision & language datasets and categorize them accordingly. Our analyses show that the most recent datasets have been using more complex language and more abstract concepts, however, there are different strengths and weaknesses in each.

* To appear in EMNLP 2015, short proceedings. Dataset analysis and discussion expanded, including an initial examination into reporting bias for one of them. F.F. and N.M. contributed equally to this work

Via

Access Paper or Ask Questions