Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Kushal Kafle

An Investigation of Critical Issues in Bias Mitigation Techniques

Apr 01, 2021

Robik Shrestha, Kushal Kafle, Christopher Kanan

Figure 1 for An Investigation of Critical Issues in Bias Mitigation Techniques

Figure 2 for An Investigation of Critical Issues in Bias Mitigation Techniques

Figure 3 for An Investigation of Critical Issues in Bias Mitigation Techniques

Figure 4 for An Investigation of Critical Issues in Bias Mitigation Techniques

Abstract:A critical problem in deep learning is that systems learn inappropriate biases, resulting in their inability to perform well on minority groups. This has led to the creation of multiple algorithms that endeavor to mitigate bias. However, it is not clear how effective these methods are. This is because study protocols differ among papers, systems are tested on datasets that fail to test many forms of bias, and systems have access to hidden knowledge or are tuned specifically to the test set. To address this, we introduce an improved evaluation protocol, sensible metrics, and a new dataset, which enables us to ask and answer critical questions about bias mitigation algorithms. We evaluate seven state-of-the-art algorithms using the same network architecture and hyperparameter selection policy across three benchmark datasets. We introduce a new dataset called Biased MNIST that enables assessment of robustness to multiple bias sources. We use Biased MNIST and a visual question answering (VQA) benchmark to assess robustness to hidden biases. Rather than only tuning to the test set distribution, we study robustness across different tuning distributions, which is critical because for many applications the test distribution may not be known during development. We find that algorithms exploit hidden biases, are unable to scale to multiple forms of bias, and are highly sensitive to the choice of tuning set. Based on our findings, we implore the community to adopt more rigorous assessment of future bias mitigation methods. All data, code, and results are publicly available at: https://github.com/erobic/bias-mitigators.

Via

Access Paper or Ask Questions

On the Value of Out-of-Distribution Testing: An Example of Goodhart's Law

May 19, 2020

Damien Teney, Kushal Kafle, Robik Shrestha, Ehsan Abbasnejad, Christopher Kanan, Anton van den Hengel

Abstract:Out-of-distribution (OOD) testing is increasingly popular for evaluating a machine learning system's ability to generalize beyond the biases of a training set. OOD benchmarks are designed to present a different joint distribution of data and labels between training and test time. VQA-CP has become the standard OOD benchmark for visual question answering, but we discovered three troubling practices in its current use. First, most published methods rely on explicit knowledge of the construction of the OOD splits. They often rely on ``inverting'' the distribution of labels, e.g. answering mostly 'yes' when the common training answer is 'no'. Second, the OOD test set is used for model selection. Third, a model's in-domain performance is assessed after retraining it on in-domain splits (VQA v2) that exhibit a more balanced distribution of labels. These three practices defeat the objective of evaluating generalization, and put into question the value of methods specifically designed for this dataset. We show that embarrassingly-simple methods, including one that generates answers at random, surpass the state of the art on some question types. We provide short- and long-term solutions to avoid these pitfalls and realize the benefits of OOD evaluation.

Via

Access Paper or Ask Questions

Do We Need Fully Connected Output Layers in Convolutional Networks?

Apr 29, 2020

Zhongchao Qian, Tyler L. Hayes, Kushal Kafle, Christopher Kanan

Figure 1 for Do We Need Fully Connected Output Layers in Convolutional Networks?

Figure 2 for Do We Need Fully Connected Output Layers in Convolutional Networks?

Figure 3 for Do We Need Fully Connected Output Layers in Convolutional Networks?

Figure 4 for Do We Need Fully Connected Output Layers in Convolutional Networks?

Abstract:Traditionally, deep convolutional neural networks consist of a series of convolutional and pooling layers followed by one or more fully connected (FC) layers to perform the final classification. While this design has been successful, for datasets with a large number of categories, the fully connected layers often account for a large percentage of the network's parameters. For applications with memory constraints, such as mobile devices and embedded platforms, this is not ideal. Recently, a family of architectures that involve replacing the learned fully connected output layer with a fixed layer has been proposed as a way to achieve better efficiency. In this paper we examine this idea further and demonstrate that fixed classifiers offer no additional benefit compared to simply removing the output layer along with its parameters. We further demonstrate that the typical approach of having a fully connected final output layer is inefficient in terms of parameter count. We are able to achieve comparable performance to a traditionally learned fully connected classification output layer on the ImageNet-1K, CIFAR-100, Stanford Cars-196, and Oxford Flowers-102 datasets, while not having a fully connected output layer at all.

Via

Access Paper or Ask Questions

A negative case analysis of visual grounding methods for VQA

Apr 15, 2020

Robik Shrestha, Kushal Kafle, Christopher Kanan

Figure 1 for A negative case analysis of visual grounding methods for VQA

Figure 2 for A negative case analysis of visual grounding methods for VQA

Figure 3 for A negative case analysis of visual grounding methods for VQA

Figure 4 for A negative case analysis of visual grounding methods for VQA

Abstract:Existing Visual Question Answering (VQA) methods tend to exploit dataset biases and spurious statistical correlations, instead of producing right answers for the right reasons. To address this issue, recent bias mitigation methods for VQA propose to incorporate visual cues (e.g., human attention maps) to better ground the VQA models, showcasing impressive gains. However, we show that the performance improvements are not a result of improved visual grounding, but a regularization effect which prevents over-fitting to linguistic priors. For instance, we find that it is not actually necessary to provide proper, human-based cues; random, insensible cues also result in similar improvements. Based on this observation, we propose a simpler regularization scheme that does not require any external annotations and yet achieves near state-of-the-art performance on VQA-CPv2.

Via

Access Paper or Ask Questions

REMIND Your Neural Network to Prevent Catastrophic Forgetting

Oct 06, 2019

Tyler L. Hayes, Kushal Kafle, Robik Shrestha, Manoj Acharya, Christopher Kanan

Figure 1 for REMIND Your Neural Network to Prevent Catastrophic Forgetting

Figure 2 for REMIND Your Neural Network to Prevent Catastrophic Forgetting

Figure 3 for REMIND Your Neural Network to Prevent Catastrophic Forgetting

Figure 4 for REMIND Your Neural Network to Prevent Catastrophic Forgetting

Abstract:In lifelong machine learning, a robotic agent must be incrementally updated with new knowledge, instead of having distinct train and deployment phases. Conventional neural networks are often used for interpreting sensor data, however, if they are updated on non-stationary data streams, they suffer from catastrophic forgetting, with new learning overwriting past knowledge. A common remedy is replay, which involves mixing old examples with new ones. For incrementally training convolutional neural network models, prior work has enabled replay by storing raw images, but this is memory intensive and not ideal for embedded agents. Here, we propose REMIND, a tensor quantization approach that enables efficient replay with tensors. Unlike other methods, REMIND is trained in a streaming manner, meaning it learns one example at a time rather than in large batches containing multiple classes. Our approach achieves state-of-the-art results for incremental class learning on the ImageNet-1K dataset. We also probe REMIND's robustness to different data ordering schemes using the CORe50 streaming dataset. We demonstrate REMIND's generality by pioneering multi-modal incremental learning for visual question answering (VQA), which cannot be readily done with comparison models. We establish strong baselines on the CLEVR and TDIUC datasets for VQA. The generality of REMIND for multi-modal tasks can enable robotic agents to learn about their visual environment using natural language understanding in an interactive way.

Via

Access Paper or Ask Questions

Answering Questions about Data Visualizations using Efficient Bimodal Fusion

Aug 05, 2019

Kushal Kafle, Robik Shrestha, Brian Price, Scott Cohen, Christopher Kanan

Figure 1 for Answering Questions about Data Visualizations using Efficient Bimodal Fusion

Figure 2 for Answering Questions about Data Visualizations using Efficient Bimodal Fusion

Figure 3 for Answering Questions about Data Visualizations using Efficient Bimodal Fusion

Figure 4 for Answering Questions about Data Visualizations using Efficient Bimodal Fusion

Abstract:Chart question answering (CQA) is a newly proposed visual question answering (VQA) task where an algorithm must answer questions about data visualizations, e.g. bar charts, pie charts, and line graphs. CQA requires capabilities that natural-image VQA algorithms lack: fine-grained measurements, optical character recognition, and handling out-of-vocabulary words in both questions and answers. Without modifications, state-of-the-art VQA algorithms perform poorly on this task. Here, we propose a novel CQA algorithm called parallel recurrent fusion of image and language (PReFIL). PReFIL first learns bimodal embeddings by fusing question and image features and then intelligently aggregates these learned embeddings to answer the given question. Despite its simplicity, PReFIL greatly surpasses state-of-the art systems and human baselines on both the FigureQA and DVQA datasets. Additionally, we demonstrate that PReFIL can be used to reconstruct tables by asking a series of questions about a chart.

Via

Access Paper or Ask Questions

Challenges and Prospects in Vision and Language Research

May 24, 2019

Kushal Kafle, Robik Shrestha, Christopher Kanan

Figure 1 for Challenges and Prospects in Vision and Language Research

Figure 2 for Challenges and Prospects in Vision and Language Research

Figure 3 for Challenges and Prospects in Vision and Language Research

Figure 4 for Challenges and Prospects in Vision and Language Research

Abstract:Language grounded image understanding tasks have often been proposed as a method for evaluating progress in artificial intelligence. Ideally, these tasks should test a plethora of capabilities that integrate computer vision, reasoning, and natural language understanding. However, rather than behaving as visual Turing tests, recent studies have demonstrated state-of-the-art systems are achieving good performance through flaws in datasets and evaluation procedures. We review the current state of affairs and outline a path forward.

Via

Access Paper or Ask Questions

Answer Them All! Toward Universal Visual Question Answering Models

Apr 05, 2019

Robik Shrestha, Kushal Kafle, Christopher Kanan

Figure 1 for Answer Them All! Toward Universal Visual Question Answering Models

Figure 2 for Answer Them All! Toward Universal Visual Question Answering Models

Figure 3 for Answer Them All! Toward Universal Visual Question Answering Models

Figure 4 for Answer Them All! Toward Universal Visual Question Answering Models

Abstract:Visual Question Answering (VQA) research is split into two camps: the first focuses on VQA datasets that require natural image understanding and the second focuses on synthetic datasets that test reasoning. A good VQA algorithm should be capable of both, but only a few VQA algorithms are tested in this manner. We compare five state-of-the-art VQA algorithms across eight VQA datasets covering both domains. To make the comparison fair, all of the models are standardized as much as possible, e.g., they use the same visual features, answer vocabularies, etc. We find that methods do not generalize across the two domains. To address this problem, we propose a new VQA algorithm that rivals or exceeds the state-of-the-art for both domains.

* 8 pages

Via

Access Paper or Ask Questions

TallyQA: Answering Complex Counting Questions

Oct 31, 2018

Manoj Acharya, Kushal Kafle, Christopher Kanan

Figure 1 for TallyQA: Answering Complex Counting Questions

Figure 2 for TallyQA: Answering Complex Counting Questions

Figure 3 for TallyQA: Answering Complex Counting Questions

Figure 4 for TallyQA: Answering Complex Counting Questions

Abstract:Most counting questions in visual question answering (VQA) datasets are simple and require no more than object detection. Here, we study algorithms for complex counting questions that involve relationships between objects, attribute identification, reasoning, and more. To do this, we created TallyQA, the world's largest dataset for open-ended counting. We propose a new algorithm for counting that uses relation networks with region proposals. Our method lets relation networks be efficiently used with high-resolution imagery. It yields state-of-the-art results compared to baseline and recent systems on both TallyQA and the HowMany-QA benchmark.

* To appear in AAAI 2019 ( To download the dataset please go to http://www.manojacharya.com/ )

Via

Access Paper or Ask Questions

DVQA: Understanding Data Visualizations via Question Answering

Mar 29, 2018

Kushal Kafle, Brian Price, Scott Cohen, Christopher Kanan

Figure 1 for DVQA: Understanding Data Visualizations via Question Answering

Figure 2 for DVQA: Understanding Data Visualizations via Question Answering

Figure 3 for DVQA: Understanding Data Visualizations via Question Answering

Figure 4 for DVQA: Understanding Data Visualizations via Question Answering

Abstract:Bar charts are an effective way to convey numeric information, but today's algorithms cannot parse them. Existing methods fail when faced with even minor variations in appearance. Here, we present DVQA, a dataset that tests many aspects of bar chart understanding in a question answering framework. Unlike visual question answering (VQA), DVQA requires processing words and answers that are unique to a particular bar chart. State-of-the-art VQA algorithms perform poorly on DVQA, and we propose two strong baselines that perform considerably better. Our work will enable algorithms to automatically extract numeric and semantic information from vast quantities of bar charts found in scientific publications, Internet articles, business reports, and many other areas.

* CVPR 2018 Camera Ready Version

Via

Access Paper or Ask Questions