Alert button
Picture for John P. Dickerson

John P. Dickerson

Alert button

RecRec: Algorithmic Recourse for Recommender Systems

Aug 28, 2023
Sahil Verma, Ashudeep Singh, Varich Boonsanong, John P. Dickerson, Chirag Shah

Figure 1 for RecRec: Algorithmic Recourse for Recommender Systems
Figure 2 for RecRec: Algorithmic Recourse for Recommender Systems
Figure 3 for RecRec: Algorithmic Recourse for Recommender Systems
Figure 4 for RecRec: Algorithmic Recourse for Recommender Systems

Recommender systems play an essential role in the choices people make in domains such as entertainment, shopping, food, news, employment, and education. The machine learning models underlying these recommender systems are often enormously large and black-box in nature for users, content providers, and system developers alike. It is often crucial for all stakeholders to understand the model's rationale behind making certain predictions and recommendations. This is especially true for the content providers whose livelihoods depend on the recommender system. Drawing motivation from the practitioners' need, in this work, we propose a recourse framework for recommender systems, targeted towards the content providers. Algorithmic recourse in the recommendation setting is a set of actions that, if executed, would modify the recommendations (or ranking) of an item in the desired manner. A recourse suggests actions of the form: "if a feature changes X to Y, then the ranking of that item for a set of users will change to Z." Furthermore, we demonstrate that RecRec is highly effective in generating valid, sparse, and actionable recourses through an empirical evaluation of recommender systems trained on three real-world datasets. To the best of our knowledge, this work is the first to conceptualize and empirically test a generalized framework for generating recourses for recommender systems.

* Accepted as a short paper at CIKM 2023 
Viaarxiv icon

Diffused Redundancy in Pre-trained Representations

May 31, 2023
Vedant Nanda, Till Speicher, John P. Dickerson, Soheil Feizi, Krishna P. Gummadi, Adrian Weller

Figure 1 for Diffused Redundancy in Pre-trained Representations
Figure 2 for Diffused Redundancy in Pre-trained Representations
Figure 3 for Diffused Redundancy in Pre-trained Representations
Figure 4 for Diffused Redundancy in Pre-trained Representations

Representations learned by pre-training a neural network on a large dataset are increasingly used successfully to perform a variety of downstream tasks. In this work, we take a closer look at how features are encoded in such pre-trained representations. We find that learned representations in a given layer exhibit a degree of diffuse redundancy, i.e., any randomly chosen subset of neurons in the layer that is larger than a threshold size shares a large degree of similarity with the full layer and is able to perform similarly as the whole layer on a variety of downstream tasks. For example, a linear probe trained on $20\%$ of randomly picked neurons from a ResNet50 pre-trained on ImageNet1k achieves an accuracy within $5\%$ of a linear probe trained on the full layer of neurons for downstream CIFAR10 classification. We conduct experiments on different neural architectures (including CNNs and Transformers) pre-trained on both ImageNet1k and ImageNet21k and evaluate a variety of downstream tasks taken from the VTAB benchmark. We find that the loss & dataset used during pre-training largely govern the degree of diffuse redundancy and the "critical mass" of neurons needed often depends on the downstream task, suggesting that there is a task-inherent redundancy-performance Pareto frontier. Our findings shed light on the nature of representations learned by pre-trained deep neural networks and suggest that entire layers might not be necessary to perform many downstream tasks. We investigate the potential for exploiting this redundancy to achieve efficient generalization for downstream tasks and also draw caution to certain possible unintended consequences.

* Under review 
Viaarxiv icon

Who's Thinking? A Push for Human-Centered Evaluation of LLMs using the XAI Playbook

Mar 10, 2023
Teresa Datta, John P. Dickerson

Deployed artificial intelligence (AI) often impacts humans, and there is no one-size-fits-all metric to evaluate these tools. Human-centered evaluation of AI-based systems combines quantitative and qualitative analysis and human input. It has been explored to some depth in the explainable AI (XAI) and human-computer interaction (HCI) communities. Gaps remain, but the basic understanding that humans interact with AI and accompanying explanations, and that humans' needs -- complete with their cognitive biases and quirks -- should be held front and center, is accepted by the community. In this paper, we draw parallels between the relatively mature field of XAI and the rapidly evolving research boom around large language models (LLMs). Accepted evaluative metrics for LLMs are not human-centered. We argue that many of the same paths tread by the XAI community over the past decade will be retread when discussing LLMs. Specifically, we argue that humans' tendencies -- again, complete with their cognitive biases and quirks -- should rest front and center when evaluating deployed LLMs. We outline three developed focus areas of human-centered evaluation of XAI: mental models, use case utility, and cognitive engagement, and we highlight the importance of exploring each of these concepts for LLMs. Our goal is to jumpstart human-centered LLM evaluation.

* Accepted to CHI 2023 workshop on Generative AI and HCI 
Viaarxiv icon

Tensions Between the Proxies of Human Values in AI

Dec 14, 2022
Teresa Datta, Daniel Nissani, Max Cembalest, Akash Khanna, Haley Massa, John P. Dickerson

Figure 1 for Tensions Between the Proxies of Human Values in AI
Figure 2 for Tensions Between the Proxies of Human Values in AI

Motivated by mitigating potentially harmful impacts of technologies, the AI community has formulated and accepted mathematical definitions for certain pillars of accountability: e.g. privacy, fairness, and model transparency. Yet, we argue this is fundamentally misguided because these definitions are imperfect, siloed constructions of the human values they hope to proxy, while giving the guise that those values are sufficiently embedded in our technologies. Under popularized methods, tensions arise when practitioners attempt to achieve each pillar of fairness, privacy, and transparency in isolation or simultaneously. In this position paper, we push for redirection. We argue that the AI community needs to consider all the consequences of choosing certain formulations of these pillars -- not just the technical incompatibilities, but also the effects within the context of deployment. We point towards sociotechnical research for frameworks for the latter, but push for broader efforts into implementing these in practice.

* Contributed Talk, NeurIPS 2022 Workshop on Algorithmic Fairness through the Lens of Causality and Privacy; To be published in 2023 IEEE Conference on Secure and Trustworthy Machine Learning (SaTML) 
Viaarxiv icon

Networked Restless Bandits with Positive Externalities

Dec 09, 2022
Christine Herlihy, John P. Dickerson

Figure 1 for Networked Restless Bandits with Positive Externalities
Figure 2 for Networked Restless Bandits with Positive Externalities
Figure 3 for Networked Restless Bandits with Positive Externalities
Figure 4 for Networked Restless Bandits with Positive Externalities

Restless multi-armed bandits are often used to model budget-constrained resource allocation tasks where receipt of the resource is associated with an increased probability of a favorable state transition. Prior work assumes that individual arms only benefit if they receive the resource directly. However, many allocation tasks occur within communities and can be characterized by positive externalities that allow arms to derive partial benefit when their neighbor(s) receive the resource. We thus introduce networked restless bandits, a novel multi-armed bandit setting in which arms are both restless and embedded within a directed graph. We then present Greta, a graph-aware, Whittle index-based heuristic algorithm that can be used to efficiently construct a constrained reward-maximizing action vector at each timestep. Our empirical results demonstrate that Greta outperforms comparison policies across a range of hyperparameter values and graph topologies.

* Accepted to AAAI 2023 
Viaarxiv icon

Robustness Disparities in Face Detection

Nov 29, 2022
Samuel Dooley, George Z. Wei, Tom Goldstein, John P. Dickerson

Figure 1 for Robustness Disparities in Face Detection
Figure 2 for Robustness Disparities in Face Detection
Figure 3 for Robustness Disparities in Face Detection
Figure 4 for Robustness Disparities in Face Detection

Facial analysis systems have been deployed by large companies and critiqued by scholars and activists for the past decade. Many existing algorithmic audits examine the performance of these systems on later stage elements of facial analysis systems like facial recognition and age, emotion, or perceived gender prediction; however, a core component to these systems has been vastly understudied from a fairness perspective: face detection, sometimes called face localization. Since face detection is a pre-requisite step in facial analysis systems, the bias we observe in face detection will flow downstream to the other components like facial recognition and emotion prediction. Additionally, no prior work has focused on the robustness of these systems under various perturbations and corruptions, which leaves open the question of how various people are impacted by these phenomena. We present the first of its kind detailed benchmark of face detection systems, specifically examining the robustness to noise of commercial and academic models. We use both standard and recently released academic facial datasets to quantitatively analyze trends in face detection robustness. Across all the datasets and systems, we generally find that photos of individuals who are $\textit{masculine presenting}$, $\textit{older}$, of $\textit{darker skin type}$, or have $\textit{dim lighting}$ are more susceptible to errors than their counterparts in other identities.

* NeurIPS Datasets & Benchmarks Track 2022 
Viaarxiv icon

Interpretable Deep Reinforcement Learning for Green Security Games with Real-Time Information

Nov 09, 2022
Vishnu Dutt Sharma, John P. Dickerson, Pratap Tokekar

Figure 1 for Interpretable Deep Reinforcement Learning for Green Security Games with Real-Time Information
Figure 2 for Interpretable Deep Reinforcement Learning for Green Security Games with Real-Time Information
Figure 3 for Interpretable Deep Reinforcement Learning for Green Security Games with Real-Time Information
Figure 4 for Interpretable Deep Reinforcement Learning for Green Security Games with Real-Time Information

Green Security Games with real-time information (GSG-I) add the real-time information about the agents' movement to the typical GSG formulation. Prior works on GSG-I have used deep reinforcement learning (DRL) to learn the best policy for the agent in such an environment without any need to store the huge number of state representations for GSG-I. However, the decision-making process of DRL methods is largely opaque, which results in a lack of trust in their predictions. To tackle this issue, we present an interpretable DRL method for GSG-I that generates visualization to explain the decisions taken by the DRL algorithm. We also show that this approach performs better and works well with a simpler training regimen compared to the existing method.

Viaarxiv icon

On the Importance of Architectures and Hyperparameters for Fairness in Face Recognition

Oct 18, 2022
Rhea Sukthanker, Samuel Dooley, John P. Dickerson, Colin White, Frank Hutter, Micah Goldblum

Figure 1 for On the Importance of Architectures and Hyperparameters for Fairness in Face Recognition
Figure 2 for On the Importance of Architectures and Hyperparameters for Fairness in Face Recognition
Figure 3 for On the Importance of Architectures and Hyperparameters for Fairness in Face Recognition
Figure 4 for On the Importance of Architectures and Hyperparameters for Fairness in Face Recognition

Face recognition systems are deployed across the world by government agencies and contractors for sensitive and impactful tasks, such as surveillance and database matching. Despite their widespread use, these systems are known to exhibit bias across a range of sociodemographic dimensions, such as gender and race. Nonetheless, an array of works proposing pre-processing, training, and post-processing methods have failed to close these gaps. Here, we take a very different approach to this problem, identifying that both architectures and hyperparameters of neural networks are instrumental in reducing bias. We first run a large-scale analysis of the impact of architectures and training hyperparameters on several common fairness metrics and show that the implicit convention of choosing high-accuracy architectures may be suboptimal for fairness. Motivated by our findings, we run the first neural architecture search for fairness, jointly with a search for hyperparameters. We output a suite of models which Pareto-dominate all other competitive architectures in terms of accuracy and fairness. Furthermore, we show that these models transfer well to other face recognition datasets with similar and distinct protected attributes. We release our code and raw result files so that researchers and practitioners can replace our fairness metrics with a bias measure of their choice.

Viaarxiv icon

Equalizing Credit Opportunity in Algorithms: Aligning Algorithmic Fairness Research with U.S. Fair Lending Regulation

Oct 05, 2022
I. Elizabeth Kumar, Keegan E. Hines, John P. Dickerson

Credit is an essential component of financial wellbeing in America, and unequal access to it is a large factor in the economic disparities between demographic groups that exist today. Today, machine learning algorithms, sometimes trained on alternative data, are increasingly being used to determine access to credit, yet research has shown that machine learning can encode many different versions of "unfairness," thus raising the concern that banks and other financial institutions could -- potentially unwittingly -- engage in illegal discrimination through the use of this technology. In the US, there are laws in place to make sure discrimination does not happen in lending and agencies charged with enforcing them. However, conversations around fair credit models in computer science and in policy are often misaligned: fair machine learning research often lacks legal and practical considerations specific to existing fair lending policy, and regulators have yet to issue new guidance on how, if at all, credit risk models should be utilizing practices and techniques from the research community. This paper aims to better align these sides of the conversation. We describe the current state of credit discrimination regulation in the United States, contextualize results from fair ML research to identify the specific fairness concerns raised by the use of machine learning in lending, and discuss regulatory opportunities to address these concerns.

* AIES '22: Proceedings of the 2022 AAAI/ACM Conference on AI, Ethics, and Society  
Viaarxiv icon

Measuring Representational Robustness of Neural Networks Through Shared Invariances

Jun 23, 2022
Vedant Nanda, Till Speicher, Camila Kolling, John P. Dickerson, Krishna P. Gummadi, Adrian Weller

Figure 1 for Measuring Representational Robustness of Neural Networks Through Shared Invariances
Figure 2 for Measuring Representational Robustness of Neural Networks Through Shared Invariances
Figure 3 for Measuring Representational Robustness of Neural Networks Through Shared Invariances
Figure 4 for Measuring Representational Robustness of Neural Networks Through Shared Invariances

A major challenge in studying robustness in deep learning is defining the set of ``meaningless'' perturbations to which a given Neural Network (NN) should be invariant. Most work on robustness implicitly uses a human as the reference model to define such perturbations. Our work offers a new view on robustness by using another reference NN to define the set of perturbations a given NN should be invariant to, thus generalizing the reliance on a reference ``human NN'' to any NN. This makes measuring robustness equivalent to measuring the extent to which two NNs share invariances, for which we propose a measure called STIR. STIR re-purposes existing representation similarity measures to make them suitable for measuring shared invariances. Using our measure, we are able to gain insights into how shared invariances vary with changes in weight initialization, architecture, loss functions, and training dataset. Our implementation is available at: \url{https://github.com/nvedant07/STIR}.

* Accepted for oral presentation at ICML 2022 
Viaarxiv icon