Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Olga Russakovsky

Unseen Image Synthesis with Diffusion Models

Oct 13, 2023

Ye Zhu, Yu Wu, Zhiwei Deng, Olga Russakovsky, Yan Yan

Figure 1 for Unseen Image Synthesis with Diffusion Models

Figure 2 for Unseen Image Synthesis with Diffusion Models

Figure 3 for Unseen Image Synthesis with Diffusion Models

Figure 4 for Unseen Image Synthesis with Diffusion Models

Abstract:While the current trend in the generative field is scaling up towards larger models and more training data for generalized domain representations, we go the opposite direction in this work by synthesizing unseen domain images without additional training. We do so via latent sampling and geometric optimization using pre-trained and frozen Denoising Diffusion Probabilistic Models (DDPMs) on single-domain datasets. Our key observation is that DDPMs pre-trained even just on single-domain images are already equipped with sufficient representation abilities to reconstruct arbitrary images from the inverted latent encoding following bi-directional deterministic diffusion and denoising trajectories. This motivates us to investigate the statistical and geometric behaviors of the Out-Of-Distribution (OOD) samples from unseen image domains in the latent spaces along the denoising chain. Notably, we theoretically and empirically show that the inverted OOD samples also establish Gaussians that are distinguishable from the original In-Domain (ID) samples in the intermediate latent spaces, which allows us to sample from them directly. Geometrical domain-specific and model-dependent information of the unseen subspace (e.g., sample-wise distance and angles) is used to further optimize the sampled OOD latent encodings from the estimated Gaussian prior. We conduct extensive analysis and experiments using pre-trained diffusion models (DDPM, iDDPM) on different datasets (AFHQ, CelebA-HQ, LSUN-Church, and LSUN-Bedroom), proving the effectiveness of this novel perspective to explore and re-think the diffusion models' data synthesis generalization ability.

* 28 pages including appendices

Via

Access Paper or Ask Questions

ImageNet-OOD: Deciphering Modern Out-of-Distribution Detection Algorithms

Oct 03, 2023

William Yang, Byron Zhang, Olga Russakovsky

Figure 1 for ImageNet-OOD: Deciphering Modern Out-of-Distribution Detection Algorithms

Figure 2 for ImageNet-OOD: Deciphering Modern Out-of-Distribution Detection Algorithms

Figure 3 for ImageNet-OOD: Deciphering Modern Out-of-Distribution Detection Algorithms

Figure 4 for ImageNet-OOD: Deciphering Modern Out-of-Distribution Detection Algorithms

Abstract:The task of out-of-distribution (OOD) detection is notoriously ill-defined. Earlier works focused on new-class detection, aiming to identify label-altering data distribution shifts, also known as "semantic shift." However, recent works argue for a focus on failure detection, expanding the OOD evaluation framework to account for label-preserving data distribution shifts, also known as "covariate shift." Intriguingly, under this new framework, complex OOD detectors that were previously considered state-of-the-art now perform similarly to, or even worse than the simple maximum softmax probability baseline. This raises the question: what are the latest OOD detectors actually detecting? Deciphering the behavior of OOD detection algorithms requires evaluation datasets that decouples semantic shift and covariate shift. To aid our investigations, we present ImageNet-OOD, a clean semantic shift dataset that minimizes the interference of covariate shift. Through comprehensive experiments, we show that OOD detectors are more sensitive to covariate shift than to semantic shift, and the benefits of recent OOD detection algorithms on semantic shift detection is minimal. Our dataset and analyses provide important insights for guiding the design of future OOD detectors.

* 28 pages, 11 figures

Via

Access Paper or Ask Questions

Multimodal Dataset Distillation for Image-Text Retrieval

Aug 15, 2023

Xindi Wu, Zhiwei Deng, Olga Russakovsky

Figure 1 for Multimodal Dataset Distillation for Image-Text Retrieval

Figure 2 for Multimodal Dataset Distillation for Image-Text Retrieval

Figure 3 for Multimodal Dataset Distillation for Image-Text Retrieval

Figure 4 for Multimodal Dataset Distillation for Image-Text Retrieval

Abstract:Dataset distillation methods offer the promise of reducing a large-scale dataset down to a significantly smaller set of (potentially synthetic) training examples, which preserve sufficient information for training a new model from scratch. So far dataset distillation methods have been developed for image classification. However, with the rise in capabilities of vision-language models, and especially given the scale of datasets necessary to train these models, the time is ripe to expand dataset distillation methods beyond image classification. In this work, we take the first steps towards this goal by expanding on the idea of trajectory matching to create a distillation method for vision-language datasets. The key challenge is that vision-language datasets do not have a set of discrete classes. To overcome this, our proposed multimodal dataset distillation method jointly distill the images and their corresponding language descriptions in a contrastive formulation. Since there are no existing baselines, we compare our approach to three coreset selection methods (strategic subsampling of the training dataset), which we adapt to the vision-language setting. We demonstrate significant improvements on the challenging Flickr30K and COCO retrieval benchmark: the best coreset selection method which selects 1000 image-text pairs for training is able to achieve only 5.6% image-to-text retrieval accuracy (recall@1); in contrast, our dataset distillation approach almost doubles that with just 100 (an order of magnitude fewer) training pairs.

* 28 pages, 11 figures

Via

Access Paper or Ask Questions

Art and the science of generative AI: A deeper dive

Jun 07, 2023

Ziv Epstein, Aaron Hertzmann, Laura Herman, Robert Mahari, Morgan R. Frank, Matthew Groh, Hope Schroeder, Amy Smith, Memo Akten, Jessica Fjeld(+4 more)

Abstract:A new class of tools, colloquially called generative AI, can produce high-quality artistic media for visual arts, concept art, music, fiction, literature, video, and animation. The generative capabilities of these tools are likely to fundamentally alter the creative processes by which creators formulate ideas and put them into production. As creativity is reimagined, so too may be many sectors of society. Understanding the impact of generative AI - and making policy decisions around it - requires new interdisciplinary scientific inquiry into culture, economics, law, algorithms, and the interaction of technology and creativity. We argue that generative AI is not the harbinger of art's demise, but rather is a new medium with its own distinct affordances. In this vein, we consider the impacts of this new medium on creators across four themes: aesthetics and culture, legal questions of ownership and credit, the future of creative work, and impacts on the contemporary media ecosystem. Across these themes, we highlight key research questions and directions to inform policy and beneficial uses of the technology.

* This white paper is an expanded version of Epstein et al 2023 published in Science Perspectives on July 16, 2023 which you can find at the following DOI: 10.1126/science.adh4451

Via

Access Paper or Ask Questions

ICON$^2$: Reliably Benchmarking Predictive Inequity in Object Detection

Jun 07, 2023

Sruthi Sudhakar, Viraj Prabhu, Olga Russakovsky, Judy Hoffman

Figure 1 for ICON$^2$: Reliably Benchmarking Predictive Inequity in Object Detection

Figure 2 for ICON$^2$: Reliably Benchmarking Predictive Inequity in Object Detection

Figure 3 for ICON$^2$: Reliably Benchmarking Predictive Inequity in Object Detection

Figure 4 for ICON$^2$: Reliably Benchmarking Predictive Inequity in Object Detection

Abstract:As computer vision systems are being increasingly deployed at scale in high-stakes applications like autonomous driving, concerns about social bias in these systems are rising. Analysis of fairness in real-world vision systems, such as object detection in driving scenes, has been limited to observing predictive inequity across attributes such as pedestrian skin tone, and lacks a consistent methodology to disentangle the role of confounding variables e.g. does my model perform worse for a certain skin tone, or are such scenes in my dataset more challenging due to occlusion and crowds? In this work, we introduce ICON$^2$, a framework for robustly answering this question. ICON$^2$ leverages prior knowledge on the deficiencies of object detection systems to identify performance discrepancies across sub-populations, compute correlations between these potential confounders and a given sensitive attribute, and control for the most likely confounders to obtain a more reliable estimate of model bias. Using our approach, we conduct an in-depth study on the performance of object detection with respect to income from the BDD100K driving dataset, revealing useful insights.

* Accepted to CVPR 2023 SSAD Workshop

Via

Access Paper or Ask Questions

Humans, AI, and Context: Understanding End-Users' Trust in a Real-World Computer Vision Application

May 15, 2023

Sunnie S. Y. Kim, Elizabeth Anne Watkins, Olga Russakovsky, Ruth Fong, Andrés Monroy-Hernández

Figure 1 for Humans, AI, and Context: Understanding End-Users' Trust in a Real-World Computer Vision Application

Figure 2 for Humans, AI, and Context: Understanding End-Users' Trust in a Real-World Computer Vision Application

Figure 3 for Humans, AI, and Context: Understanding End-Users' Trust in a Real-World Computer Vision Application

Figure 4 for Humans, AI, and Context: Understanding End-Users' Trust in a Real-World Computer Vision Application

Abstract:Trust is an important factor in people's interactions with AI systems. However, there is a lack of empirical studies examining how real end-users trust or distrust the AI system they interact with. Most research investigates one aspect of trust in lab settings with hypothetical end-users. In this paper, we provide a holistic and nuanced understanding of trust in AI through a qualitative case study of a real-world computer vision application. We report findings from interviews with 20 end-users of a popular, AI-based bird identification app where we inquired about their trust in the app from many angles. We find participants perceived the app as trustworthy and trusted it, but selectively accepted app outputs after engaging in verification behaviors, and decided against app adoption in certain high-stakes scenarios. We also find domain knowledge and context are important factors for trust-related assessment and decision-making. We discuss the implications of our findings and provide recommendations for future research on trust in AI.

* FAccT 2023

Via

Access Paper or Ask Questions

UFO: A unified method for controlling Understandability and Faithfulness Objectives in concept-based explanations for CNNs

Mar 27, 2023

Vikram V. Ramaswamy, Sunnie S. Y. Kim, Ruth Fong, Olga Russakovsky

Figure 1 for UFO: A unified method for controlling Understandability and Faithfulness Objectives in concept-based explanations for CNNs

Figure 2 for UFO: A unified method for controlling Understandability and Faithfulness Objectives in concept-based explanations for CNNs

Figure 3 for UFO: A unified method for controlling Understandability and Faithfulness Objectives in concept-based explanations for CNNs

Figure 4 for UFO: A unified method for controlling Understandability and Faithfulness Objectives in concept-based explanations for CNNs

Abstract:Concept-based explanations for convolutional neural networks (CNNs) aim to explain model behavior and outputs using a pre-defined set of semantic concepts (e.g., the model recognizes scene class ``bedroom'' based on the presence of concepts ``bed'' and ``pillow''). However, they often do not faithfully (i.e., accurately) characterize the model's behavior and can be too complex for people to understand. Further, little is known about how faithful and understandable different explanation methods are, and how to control these two properties. In this work, we propose UFO, a unified method for controlling Understandability and Faithfulness Objectives in concept-based explanations. UFO formalizes understandability and faithfulness as mathematical objectives and unifies most existing concept-based explanations methods for CNNs. Using UFO, we systematically investigate how explanations change as we turn the knobs of faithfulness and understandability. Our experiments demonstrate a faithfulness-vs-understandability tradeoff: increasing understandability reduces faithfulness. We also provide insights into the ``disagreement problem'' in explainable machine learning, by analyzing when and how concept-based explanations disagree with each other.

Via

Access Paper or Ask Questions

Overcoming Bias in Pretrained Models by Manipulating the Finetuning Dataset

Mar 10, 2023

Angelina Wang, Olga Russakovsky

Abstract:Transfer learning is beneficial by allowing the expressive features of models pretrained on large-scale datasets to be finetuned for the target task of smaller, more domain-specific datasets. However, there is a concern that these pretrained models may come with their own biases which would propagate into the finetuned model. In this work, we investigate bias when conceptualized as both spurious correlations between the target task and a sensitive attribute as well as underrepresentation of a particular group in the dataset. Under both notions of bias, we find that (1) models finetuned on top of pretrained models can indeed inherit their biases, but (2) this bias can be corrected for through relatively minor interventions to the finetuning dataset, and often with a negligible impact to performance. Our findings imply that careful curation of the finetuning dataset is important for reducing biases on a downstream task, and doing so can even compensate for bias in the pretrained model.

Via

Access Paper or Ask Questions

Boundary Guided Mixing Trajectory for Semantic Control with Diffusion Models

Feb 16, 2023

Ye Zhu, Yu Wu, Zhiwei Deng, Olga Russakovsky, Yan Yan

Figure 1 for Boundary Guided Mixing Trajectory for Semantic Control with Diffusion Models

Figure 2 for Boundary Guided Mixing Trajectory for Semantic Control with Diffusion Models

Figure 3 for Boundary Guided Mixing Trajectory for Semantic Control with Diffusion Models

Figure 4 for Boundary Guided Mixing Trajectory for Semantic Control with Diffusion Models

Abstract:Applying powerful generative denoising diffusion models (DDMs) for downstream tasks such as image semantic editing usually requires either fine-tuning pre-trained DDMs or learning auxiliary editing networks. In this work, we achieve SOTA semantic control performance on various application settings by optimizing the denoising trajectory solely via frozen DDMs. As one of the first optimization-based diffusion editing work, we start by seeking a more comprehensive understanding of the intermediate high-dimensional latent spaces by theoretically and empirically analyzing their probabilistic and geometric behaviors in the Markov chain. We then propose to further explore the critical step in the denoising trajectory that characterizes the convergence of a pre-trained DDM. Last but not least, we further present our method to search for the semantic subspaces boundaries for controllable manipulation, by guiding the denoising trajectory towards the targeted boundary at the critical convergent step. We conduct extensive experiments on various DPMs architectures (DDPM, iDDPM) and datasets (CelebA, CelebA-HQ, LSUN-church, LSUN-bedroom, AFHQ-dog) with different resolutions (64, 256) as empirical demonstrations.

* 24 pages including appendices, code will be available at https://github.com/L-YeZhu/BoundaryDiffusion

Via

Access Paper or Ask Questions

Beyond web-scraping: Crowd-sourcing a geographically diverse image dataset

Jan 05, 2023

Vikram V. Ramaswamy, Sing Yu Lin, Dora Zhao, Aaron B. Adcock, Laurens van der Maaten, Deepti Ghadiyaram, Olga Russakovsky

Figure 1 for Beyond web-scraping: Crowd-sourcing a geographically diverse image dataset

Figure 2 for Beyond web-scraping: Crowd-sourcing a geographically diverse image dataset

Figure 3 for Beyond web-scraping: Crowd-sourcing a geographically diverse image dataset

Figure 4 for Beyond web-scraping: Crowd-sourcing a geographically diverse image dataset

Abstract:Current dataset collection methods typically scrape large amounts of data from the web. While this technique is extremely scalable, data collected in this way tends to reinforce stereotypical biases, can contain personally identifiable information, and typically originates from Europe and North America. In this work, we rethink the dataset collection paradigm and introduce GeoDE, a geographically diverse dataset with 61,940 images from 40 classes and 6 world regions, and no personally identifiable information, collected through crowd-sourcing. We analyse GeoDE to understand differences in images collected in this manner compared to web-scraping. Despite the smaller size of this dataset, we demonstrate its use as both an evaluation and training dataset, highlight shortcomings in current models, as well as show improved performances when even small amounts of GeoDE (1000 - 2000 images per region) are added to a training dataset. We release the full dataset and code at https://geodiverse-data-collection.cs.princeton.edu/

Via

Access Paper or Ask Questions