Alert button
Picture for Pietro Perona

Pietro Perona

Alert button

Benchmarking Algorithmic Bias in Face Recognition: An Experimental Approach Using Synthetic Faces and Human Evaluation

Aug 10, 2023
Hao Liang, Pietro Perona, Guha Balakrishnan

We propose an experimental method for measuring bias in face recognition systems. Existing methods to measure bias depend on benchmark datasets that are collected in the wild and annotated for protected (e.g., race, gender) and non-protected (e.g., pose, lighting) attributes. Such observational datasets only permit correlational conclusions, e.g., "Algorithm A's accuracy is different on female and male faces in dataset X.". By contrast, experimental methods manipulate attributes individually and thus permit causal conclusions, e.g., "Algorithm A's accuracy is affected by gender and skin color." Our method is based on generating synthetic faces using a neural face generator, where each attribute of interest is modified independently while leaving all other attributes constant. Human observers crucially provide the ground truth on perceptual identity similarity between synthetic image pairs. We validate our method quantitatively by evaluating race and gender biases of three research-grade face recognition models. Our synthetic pipeline reveals that for these algorithms, accuracy is lower for Black and East Asian population subgroups. Our method can also quantify how perceptual changes in attributes affect face identity distances reported by these models. Our large synthetic dataset, consisting of 48,000 synthetic face image pairs (10,200 unique synthetic faces) and 555,000 human annotations (individual attributes and pairwise identity comparisons) is available to researchers in this important area.

* accepted to iccv2023; 18 figures 
Viaarxiv icon

Spatial Implicit Neural Representations for Global-Scale Species Mapping

Jun 05, 2023
Elijah Cole, Grant Van Horn, Christian Lange, Alexander Shepard, Patrick Leary, Pietro Perona, Scott Loarie, Oisin Mac Aodha

Figure 1 for Spatial Implicit Neural Representations for Global-Scale Species Mapping
Figure 2 for Spatial Implicit Neural Representations for Global-Scale Species Mapping
Figure 3 for Spatial Implicit Neural Representations for Global-Scale Species Mapping
Figure 4 for Spatial Implicit Neural Representations for Global-Scale Species Mapping

Estimating the geographical range of a species from sparse observations is a challenging and important geospatial prediction problem. Given a set of locations where a species has been observed, the goal is to build a model to predict whether the species is present or absent at any location. This problem has a long history in ecology, but traditional methods struggle to take advantage of emerging large-scale crowdsourced datasets which can include tens of millions of records for hundreds of thousands of species. In this work, we use Spatial Implicit Neural Representations (SINRs) to jointly estimate the geographical range of 47k species simultaneously. We find that our approach scales gracefully, making increasingly better predictions as we increase the number of species and the amount of data per species when training. To make this problem accessible to machine learning researchers, we provide four new benchmarks that measure different aspects of species range estimation and spatial representation learning. Using these benchmarks, we demonstrate that noisy and biased crowdsourced data can be combined with implicit neural representations to approximate expert-developed range maps for many species.

* ICML 2023 
Viaarxiv icon

Confidence Intervals for Error Rates in Matching Tasks: Critical Review and Recommendations

Jun 01, 2023
Riccardo Fogliato, Pratik Patil, Pietro Perona

Figure 1 for Confidence Intervals for Error Rates in Matching Tasks: Critical Review and Recommendations
Figure 2 for Confidence Intervals for Error Rates in Matching Tasks: Critical Review and Recommendations
Figure 3 for Confidence Intervals for Error Rates in Matching Tasks: Critical Review and Recommendations
Figure 4 for Confidence Intervals for Error Rates in Matching Tasks: Critical Review and Recommendations

Matching algorithms are commonly used to predict matches between items in a collection. For example, in 1:1 face verification, a matching algorithm predicts whether two face images depict the same person. Accurately assessing the uncertainty of the error rates of such algorithms can be challenging when data are dependent and error rates are low, two aspects that have been often overlooked in the literature. In this work, we review methods for constructing confidence intervals for error rates in matching tasks such as 1:1 face verification. We derive and examine the statistical properties of these methods and demonstrate how coverage and interval width vary with sample size, error rates, and degree of data dependence using both synthetic and real-world datasets. Based on our findings, we provide recommendations for best practices for constructing confidence intervals for error rates in matching tasks.

* 32 pages, 8 figures 
Viaarxiv icon

Understanding Label Bias in Single Positive Multi-Label Learning

May 24, 2023
Julio Arroyo, Pietro Perona, Elijah Cole

Figure 1 for Understanding Label Bias in Single Positive Multi-Label Learning

Annotating data for multi-label classification is prohibitively expensive because every category of interest must be confirmed to be present or absent. Recent work on single positive multi-label (SPML) learning shows that it is possible to train effective multi-label classifiers using only one positive label per image. However, the standard benchmarks for SPML are derived from traditional multi-label classification datasets by retaining one positive label for each training example (chosen uniformly at random) and discarding all other labels. In realistic settings it is not likely that positive labels are chosen uniformly at random. This work introduces protocols for studying label bias in SPML and provides new empirical results.

* ICLR 2023, Tiny Papers Track 
Viaarxiv icon

BKinD-3D: Self-Supervised 3D Keypoint Discovery from Multi-View Videos

Dec 14, 2022
Jennifer J. Sun, Pierre Karashchuk, Amil Dravid, Serim Ryou, Sonia Fereidooni, John Tuthill, Aggelos Katsaggelos, Bingni W. Brunton, Georgia Gkioxari, Ann Kennedy, Yisong Yue, Pietro Perona

Figure 1 for BKinD-3D: Self-Supervised 3D Keypoint Discovery from Multi-View Videos
Figure 2 for BKinD-3D: Self-Supervised 3D Keypoint Discovery from Multi-View Videos
Figure 3 for BKinD-3D: Self-Supervised 3D Keypoint Discovery from Multi-View Videos
Figure 4 for BKinD-3D: Self-Supervised 3D Keypoint Discovery from Multi-View Videos

Quantifying motion in 3D is important for studying the behavior of humans and other animals, but manual pose annotations are expensive and time-consuming to obtain. Self-supervised keypoint discovery is a promising strategy for estimating 3D poses without annotations. However, current keypoint discovery approaches commonly process single 2D views and do not operate in the 3D space. We propose a new method to perform self-supervised keypoint discovery in 3D from multi-view videos of behaving agents, without any keypoint or bounding box supervision in 2D or 3D. Our method uses an encoder-decoder architecture with a 3D volumetric heatmap, trained to reconstruct spatiotemporal differences across multiple views, in addition to joint length constraints on a learned 3D skeleton of the subject. In this way, we discover keypoints without requiring manual supervision in videos of humans and rats, demonstrating the potential of 3D keypoint discovery for studying behavior.

Viaarxiv icon

Visual Knowledge Tracing

Jul 22, 2022
Neehar Kondapaneni, Pietro Perona, Oisin Mac Aodha

Figure 1 for Visual Knowledge Tracing
Figure 2 for Visual Knowledge Tracing
Figure 3 for Visual Knowledge Tracing
Figure 4 for Visual Knowledge Tracing

Each year, thousands of people learn new visual categorization tasks -- radiologists learn to recognize tumors, birdwatchers learn to distinguish similar species, and crowd workers learn how to annotate valuable data for applications like autonomous driving. As humans learn, their brain updates the visual features it extracts and attend to, which ultimately informs their final classification decisions. In this work, we propose a novel task of tracing the evolving classification behavior of human learners as they engage in challenging visual classification tasks. We propose models that jointly extract the visual features used by learners as well as predicting the classification functions they utilize. We collect three challenging new datasets from real human learners in order to evaluate the performance of different visual knowledge tracing methods. Our results show that our recurrent models are able to predict the classification behavior of human learners on three challenging medical image and species identification tasks.

* 14 pages, 4 figures, 14 supplemental pages, 11 supplemental figures, accepted to European Conference on Computer Vision (ECCV) 2022 
Viaarxiv icon

The MABe22 Benchmarks for Representation Learning of Multi-Agent Behavior

Jul 21, 2022
Jennifer J. Sun, Andrew Ulmer, Dipam Chakraborty, Brian Geuther, Edward Hayes, Heng Jia, Vivek Kumar, Zachary Partridge, Alice Robie, Catherine E. Schretter, Chao Sun, Keith Sheppard, Param Uttarwar, Pietro Perona, Yisong Yue, Kristin Branson, Ann Kennedy

Figure 1 for The MABe22 Benchmarks for Representation Learning of Multi-Agent Behavior
Figure 2 for The MABe22 Benchmarks for Representation Learning of Multi-Agent Behavior
Figure 3 for The MABe22 Benchmarks for Representation Learning of Multi-Agent Behavior
Figure 4 for The MABe22 Benchmarks for Representation Learning of Multi-Agent Behavior

Real-world behavior is often shaped by complex interactions between multiple agents. To scalably study multi-agent behavior, advances in unsupervised and self-supervised learning have enabled a variety of different behavioral representations to be learned from trajectory data. To date, there does not exist a unified set of benchmarks that can enable comparing methods quantitatively and systematically across a broad set of behavior analysis settings. We aim to address this by introducing a large-scale, multi-agent trajectory dataset from real-world behavioral neuroscience experiments that covers a range of behavior analysis tasks. Our dataset consists of trajectory data from common model organisms, with 9.6 million frames of mouse data and 4.4 million frames of fly data, in a variety of experimental settings, such as different strains, lengths of interaction, and optogenetic stimulation. A subset of the frames also consist of expert-annotated behavior labels. Improvements on our dataset corresponds to behavioral representations that work across multiple organisms and is able to capture differences for common behavior analysis tasks.

* Project website: https://sites.google.com/view/computational-behavior/our-datasets/mabe2022-dataset 
Viaarxiv icon

On Label Granularity and Object Localization

Jul 20, 2022
Elijah Cole, Kimberly Wilber, Grant Van Horn, Xuan Yang, Marco Fornoni, Pietro Perona, Serge Belongie, Andrew Howard, Oisin Mac Aodha

Figure 1 for On Label Granularity and Object Localization
Figure 2 for On Label Granularity and Object Localization
Figure 3 for On Label Granularity and Object Localization
Figure 4 for On Label Granularity and Object Localization

Weakly supervised object localization (WSOL) aims to learn representations that encode object location using only image-level category labels. However, many objects can be labeled at different levels of granularity. Is it an animal, a bird, or a great horned owl? Which image-level labels should we use? In this paper we study the role of label granularity in WSOL. To facilitate this investigation we introduce iNatLoc500, a new large-scale fine-grained benchmark dataset for WSOL. Surprisingly, we find that choosing the right training label granularity provides a much larger performance boost than choosing the best WSOL algorithm. We also show that changing the label granularity can significantly improve data efficiency.

* ECCV 2022 
Viaarxiv icon

The Caltech Fish Counting Dataset: A Benchmark for Multiple-Object Tracking and Counting

Jul 19, 2022
Justin Kay, Peter Kulits, Suzanne Stathatos, Siqi Deng, Erik Young, Sara Beery, Grant Van Horn, Pietro Perona

Figure 1 for The Caltech Fish Counting Dataset: A Benchmark for Multiple-Object Tracking and Counting
Figure 2 for The Caltech Fish Counting Dataset: A Benchmark for Multiple-Object Tracking and Counting
Figure 3 for The Caltech Fish Counting Dataset: A Benchmark for Multiple-Object Tracking and Counting
Figure 4 for The Caltech Fish Counting Dataset: A Benchmark for Multiple-Object Tracking and Counting

We present the Caltech Fish Counting Dataset (CFC), a large-scale dataset for detecting, tracking, and counting fish in sonar videos. We identify sonar videos as a rich source of data for advancing low signal-to-noise computer vision applications and tackling domain generalization in multiple-object tracking (MOT) and counting. In comparison to existing MOT and counting datasets, which are largely restricted to videos of people and vehicles in cities, CFC is sourced from a natural-world domain where targets are not easily resolvable and appearance features cannot be easily leveraged for target re-identification. With over half a million annotations in over 1,500 videos sourced from seven different sonar cameras, CFC allows researchers to train MOT and counting algorithms and evaluate generalization performance at unseen test locations. We perform extensive baseline experiments and identify key challenges and opportunities for advancing the state of the art in generalization in MOT and counting.

* ECCV 2022. 33 pages, 12 figures 
Viaarxiv icon

Near Perfect GAN Inversion

Feb 23, 2022
Qianli Feng, Viraj Shah, Raghudeep Gadde, Pietro Perona, Aleix Martinez

Figure 1 for Near Perfect GAN Inversion
Figure 2 for Near Perfect GAN Inversion
Figure 3 for Near Perfect GAN Inversion
Figure 4 for Near Perfect GAN Inversion

To edit a real photo using Generative Adversarial Networks (GANs), we need a GAN inversion algorithm to identify the latent vector that perfectly reproduces it. Unfortunately, whereas existing inversion algorithms can synthesize images similar to real photos, they cannot generate the identical clones needed in most applications. Here, we derive an algorithm that achieves near perfect reconstructions of photos. Rather than relying on encoder- or optimization-based methods to find an inverse mapping on a fixed generator $G(\cdot)$, we derive an approach to locally adjust $G(\cdot)$ to more optimally represent the photos we wish to synthesize. This is done by locally tweaking the learned mapping $G(\cdot)$ s.t. $\| {\bf x} - G({\bf z}) \|<\epsilon$, with ${\bf x}$ the photo we wish to reproduce, ${\bf z}$ the latent vector, $\|\cdot\|$ an appropriate metric, and $\epsilon > 0$ a small scalar. We show that this approach can not only produce synthetic images that are indistinguishable from the real photos we wish to replicate, but that these images are readily editable. We demonstrate the effectiveness of the derived algorithm on a variety of datasets including human faces, animals, and cars, and discuss its importance for diversity and inclusion.

Viaarxiv icon