Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Jonathan Krause

A Hierarchical Approach for Generating Descriptive Image Paragraphs

Apr 10, 2017

Jonathan Krause, Justin Johnson, Ranjay Krishna, Li Fei-Fei

Figure 1 for A Hierarchical Approach for Generating Descriptive Image Paragraphs

Figure 2 for A Hierarchical Approach for Generating Descriptive Image Paragraphs

Figure 3 for A Hierarchical Approach for Generating Descriptive Image Paragraphs

Figure 4 for A Hierarchical Approach for Generating Descriptive Image Paragraphs

Abstract:Recent progress on image captioning has made it possible to generate novel sentences describing images in natural language, but compressing an image into a single sentence can describe visual content in only coarse detail. While one new captioning approach, dense captioning, can potentially describe images in finer levels of detail by captioning many regions within an image, it in turn is unable to produce a coherent story for an image. In this paper we overcome these limitations by generating entire paragraphs for describing images, which can tell detailed, unified stories. We develop a model that decomposes both images and paragraphs into their constituent parts, detecting semantic regions in images and using a hierarchical recurrent neural network to reason about language. Linguistic analysis confirms the complexity of the paragraph generation task, and thorough experiments on a new dataset of image and paragraph pairs demonstrate the effectiveness of our approach.

* CVPR 2017 spotlight

Via

Access Paper or Ask Questions

Using Deep Learning and Google Street View to Estimate the Demographic Makeup of the US

Mar 02, 2017

Timnit Gebru, Jonathan Krause, Yilun Wang, Duyun Chen, Jia Deng, Erez Lieberman Aiden, Li Fei-Fei

Figure 1 for Using Deep Learning and Google Street View to Estimate the Demographic Makeup of the US

Figure 2 for Using Deep Learning and Google Street View to Estimate the Demographic Makeup of the US

Figure 3 for Using Deep Learning and Google Street View to Estimate the Demographic Makeup of the US

Abstract:The United States spends more than $1B each year on initiatives such as the American Community Survey (ACS), a labor-intensive door-to-door study that measures statistics relating to race, gender, education, occupation, unemployment, and other demographic factors. Although a comprehensive source of data, the lag between demographic changes and their appearance in the ACS can exceed half a decade. As digital imagery becomes ubiquitous and machine vision techniques improve, automated data analysis may provide a cheaper and faster alternative. Here, we present a method that determines socioeconomic trends from 50 million images of street scenes, gathered in 200 American cities by Google Street View cars. Using deep learning-based computer vision techniques, we determined the make, model, and year of all motor vehicles encountered in particular neighborhoods. Data from this census of motor vehicles, which enumerated 22M automobiles in total (8% of all automobiles in the US), was used to accurately estimate income, race, education, and voting patterns, with single-precinct resolution. (The average US precinct contains approximately 1000 people.) The resulting associations are surprisingly simple and powerful. For instance, if the number of sedans encountered during a 15-minute drive through a city is higher than the number of pickup trucks, the city is likely to vote for a Democrat during the next Presidential election (88% chance); otherwise, it is likely to vote Republican (82%). Our results suggest that automated systems for monitoring demographic trends may effectively complement labor-intensive approaches, with the potential to detect trends with fine spatial resolution, in close to real time.

* 41 pages including supplementary material. Under review at PNAS

Via

Access Paper or Ask Questions

The Unreasonable Effectiveness of Noisy Data for Fine-Grained Recognition

Oct 18, 2016

Jonathan Krause, Benjamin Sapp, Andrew Howard, Howard Zhou, Alexander Toshev, Tom Duerig, James Philbin, Li Fei-Fei

Figure 1 for The Unreasonable Effectiveness of Noisy Data for Fine-Grained Recognition

Figure 2 for The Unreasonable Effectiveness of Noisy Data for Fine-Grained Recognition

Figure 3 for The Unreasonable Effectiveness of Noisy Data for Fine-Grained Recognition

Figure 4 for The Unreasonable Effectiveness of Noisy Data for Fine-Grained Recognition

Abstract:Current approaches for fine-grained recognition do the following: First, recruit experts to annotate a dataset of images, optionally also collecting more structured data in the form of part annotations and bounding boxes. Second, train a model utilizing this data. Toward the goal of solving fine-grained recognition, we introduce an alternative approach, leveraging free, noisy data from the web and simple, generic methods of recognition. This approach has benefits in both performance and scalability. We demonstrate its efficacy on four fine-grained datasets, greatly exceeding existing state of the art without the manual collection of even a single label, and furthermore show first results at scaling to more than 10,000 fine-grained categories. Quantitatively, we achieve top-1 accuracies of 92.3% on CUB-200-2011, 85.4% on Birdsnap, 93.4% on FGVC-Aircraft, and 80.8% on Stanford Dogs without using their annotated training sets. We compare our approach to an active learning approach for expanding fine-grained datasets.

* ECCV 2016, data is released

Via

Access Paper or Ask Questions

ImageNet Large Scale Visual Recognition Challenge

Jan 30, 2015

Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein(+2 more)

Figure 1 for ImageNet Large Scale Visual Recognition Challenge

Figure 2 for ImageNet Large Scale Visual Recognition Challenge

Figure 3 for ImageNet Large Scale Visual Recognition Challenge

Figure 4 for ImageNet Large Scale Visual Recognition Challenge

Abstract:The ImageNet Large Scale Visual Recognition Challenge is a benchmark in object category classification and detection on hundreds of object categories and millions of images. The challenge has been run annually from 2010 to present, attracting participation from more than fifty institutions. This paper describes the creation of this benchmark dataset and the advances in object recognition that have been possible as a result. We discuss the challenges of collecting large-scale ground truth annotation, highlight key breakthroughs in categorical object recognition, provide a detailed analysis of the current state of the field of large-scale image classification and object detection, and compare the state-of-the-art computer vision accuracy with human accuracy. We conclude with lessons learned in the five years of the challenge, and propose future directions and improvements.

* 43 pages, 16 figures. v3 includes additional comparisons with PASCAL VOC (per-category comparisons in Table 3, distribution of localization difficulty in Fig 16), a list of queries used for obtaining object detection images (Appendix C), and some additional references

Via

Access Paper or Ask Questions