Alert button
Picture for Shengjia Zhao

Shengjia Zhao

Alert button

Online Distribution Shift Detection via Recency Prediction

Nov 17, 2022
Rachel Luo, Rohan Sinha, Ali Hindy, Shengjia Zhao, Silvio Savarese, Edward Schmerling, Marco Pavone

Figure 1 for Online Distribution Shift Detection via Recency Prediction
Figure 2 for Online Distribution Shift Detection via Recency Prediction
Figure 3 for Online Distribution Shift Detection via Recency Prediction
Figure 4 for Online Distribution Shift Detection via Recency Prediction

When deploying modern machine learning-enabled robotic systems in high-stakes applications, detecting distribution shift is critical. However, most existing methods for detecting distribution shift are not well-suited to robotics settings, where data often arrives in a streaming fashion and may be very high-dimensional. In this work, we present an online method for detecting distribution shift with guarantees on the false positive rate - i.e., when there is no distribution shift, our system is very unlikely (with probability $< \epsilon$) to falsely issue an alert; any alerts that are issued should therefore be heeded. Our method is specifically designed for efficient detection even with high dimensional data, and it empirically achieves up to 11x faster detection on realistic robotics settings compared to prior work while maintaining a low false negative rate in practice (whenever there is a distribution shift in our experiments, our method indeed emits an alert).

Viaarxiv icon

Generalizing Bayesian Optimization with Decision-theoretic Entropies

Oct 04, 2022
Willie Neiswanger, Lantao Yu, Shengjia Zhao, Chenlin Meng, Stefano Ermon

Figure 1 for Generalizing Bayesian Optimization with Decision-theoretic Entropies
Figure 2 for Generalizing Bayesian Optimization with Decision-theoretic Entropies
Figure 3 for Generalizing Bayesian Optimization with Decision-theoretic Entropies
Figure 4 for Generalizing Bayesian Optimization with Decision-theoretic Entropies

Bayesian optimization (BO) is a popular method for efficiently inferring optima of an expensive black-box function via a sequence of queries. Existing information-theoretic BO procedures aim to make queries that most reduce the uncertainty about optima, where the uncertainty is captured by Shannon entropy. However, an optimal measure of uncertainty would, ideally, factor in how we intend to use the inferred quantity in some downstream procedure. In this paper, we instead consider a generalization of Shannon entropy from work in statistical decision theory (DeGroot 1962, Rao 1984), which contains a broad class of uncertainty measures parameterized by a problem-specific loss function corresponding to a downstream task. We first show that special cases of this entropy lead to popular acquisition functions used in BO procedures such as knowledge gradient, expected improvement, and entropy search. We then show how alternative choices for the loss yield a flexible family of acquisition functions that can be customized for use in novel optimization settings. Additionally, we develop gradient-based methods to efficiently optimize our proposed family of acquisition functions, and demonstrate strong empirical performance on a diverse set of sequential decision making tasks, including variants of top-$k$ optimization, multi-level set estimation, and sequence search.

* Appears in Proceedings of the 36th Conference on Neural Information Processing Systems (NeurIPS 2022) 
Viaarxiv icon

Modular Conformal Calibration

Jul 05, 2022
Charles Marx, Shengjia Zhao, Willie Neiswanger, Stefano Ermon

Figure 1 for Modular Conformal Calibration
Figure 2 for Modular Conformal Calibration
Figure 3 for Modular Conformal Calibration
Figure 4 for Modular Conformal Calibration

Uncertainty estimates must be calibrated (i.e., accurate) and sharp (i.e., informative) in order to be useful. This has motivated a variety of methods for recalibration, which use held-out data to turn an uncalibrated model into a calibrated model. However, the applicability of existing methods is limited due to their assumption that the original model is also a probabilistic model. We introduce a versatile class of algorithms for recalibration in regression that we call Modular Conformal Calibration (MCC). This framework allows one to transform any regression model into a calibrated probabilistic model. The modular design of MCC allows us to make simple adjustments to existing algorithms that enable well-behaved distribution predictions. We also provide finite-sample calibration guarantees for MCC algorithms. Our framework recovers isotonic recalibration, conformal calibration, and conformal interval prediction, implying that our theoretical results apply to those methods as well. Finally, we conduct an empirical study of MCC on 17 regression datasets. Our results show that new algorithms designed in our framework achieve near-perfect calibration and improve sharpness relative to existing methods.

Viaarxiv icon

Low-Degree Multicalibration

Mar 02, 2022
Parikshit Gopalan, Michael P. Kim, Mihir Singhal, Shengjia Zhao

Figure 1 for Low-Degree Multicalibration
Figure 2 for Low-Degree Multicalibration

Introduced as a notion of algorithmic fairness, multicalibration has proved to be a powerful and versatile concept with implications far beyond its original intent. This stringent notion -- that predictions be well-calibrated across a rich class of intersecting subpopulations -- provides its strong guarantees at a cost: the computational and sample complexity of learning multicalibrated predictors are high, and grow exponentially with the number of class labels. In contrast, the relaxed notion of multiaccuracy can be achieved more efficiently, yet many of the most desirable properties of multicalibration cannot be guaranteed assuming multiaccuracy alone. This tension raises a key question: Can we learn predictors with multicalibration-style guarantees at a cost commensurate with multiaccuracy? In this work, we define and initiate the study of Low-Degree Multicalibration. Low-Degree Multicalibration defines a hierarchy of increasingly-powerful multi-group fairness notions that spans multiaccuracy and the original formulation of multicalibration at the extremes. Our main technical contribution demonstrates that key properties of multicalibration, related to fairness and accuracy, actually manifest as low-degree properties. Importantly, we show that low-degree multicalibration can be significantly more efficient than full multicalibration. In the multi-class setting, the sample complexity to achieve low-degree multicalibration improves exponentially (in the number of classes) over full multicalibration. Our work presents compelling evidence that low-degree multicalibration represents a sweet spot, pairing computational and sample efficiency with strong fairness and accuracy guarantees.

Viaarxiv icon

Sample-Efficient Safety Assurances using Conformal Prediction

Sep 28, 2021
Rachel Luo, Shengjia Zhao, Jonathan Kuck, Boris Ivanovic, Silvio Savarese, Edward Schmerling, Marco Pavone

Figure 1 for Sample-Efficient Safety Assurances using Conformal Prediction
Figure 2 for Sample-Efficient Safety Assurances using Conformal Prediction
Figure 3 for Sample-Efficient Safety Assurances using Conformal Prediction

When deploying machine learning models in high-stakes robotics applications, the ability to detect unsafe situations is crucial. Early warning systems can provide alerts when an unsafe situation is imminent (in the absence of corrective action). To reliably improve safety, these warning systems should have a provable false negative rate; i.e. of the situations that are unsafe, fewer than $\epsilon$ will occur without an alert. In this work, we present a framework that combines a statistical inference technique known as conformal prediction with a simulator of robot/environment dynamics, in order to tune warning systems to provably achieve an $\epsilon$ false negative rate using as few as $1/\epsilon$ data points. We apply our framework to a driver warning system and a robotic grasping application, and empirically demonstrate guaranteed false negative rate and low false detection (positive) rate using very little data.

Viaarxiv icon

Calibrating Predictions to Decisions: A Novel Approach to Multi-Class Calibration

Jul 12, 2021
Shengjia Zhao, Michael P. Kim, Roshni Sahoo, Tengyu Ma, Stefano Ermon

Figure 1 for Calibrating Predictions to Decisions: A Novel Approach to Multi-Class Calibration
Figure 2 for Calibrating Predictions to Decisions: A Novel Approach to Multi-Class Calibration
Figure 3 for Calibrating Predictions to Decisions: A Novel Approach to Multi-Class Calibration
Figure 4 for Calibrating Predictions to Decisions: A Novel Approach to Multi-Class Calibration

When facing uncertainty, decision-makers want predictions they can trust. A machine learning provider can convey confidence to decision-makers by guaranteeing their predictions are distribution calibrated -- amongst the inputs that receive a predicted class probabilities vector $q$, the actual distribution over classes is $q$. For multi-class prediction problems, however, achieving distribution calibration tends to be infeasible, requiring sample complexity exponential in the number of classes $C$. In this work, we introduce a new notion -- \emph{decision calibration} -- that requires the predicted distribution and true distribution to be ``indistinguishable'' to a set of downstream decision-makers. When all possible decision makers are under consideration, decision calibration is the same as distribution calibration. However, when we only consider decision makers choosing between a bounded number of actions (e.g. polynomial in $C$), our main result shows that decisions calibration becomes feasible -- we design a recalibration algorithm that requires sample complexity polynomial in the number of actions and the number of classes. We validate our recalibration algorithm empirically: compared to existing methods, decision calibration improves decision-making on skin lesion and ImageNet classification with modern neural network predictors.

Viaarxiv icon

Improved Autoregressive Modeling with Distribution Smoothing

Mar 28, 2021
Chenlin Meng, Jiaming Song, Yang Song, Shengjia Zhao, Stefano Ermon

Figure 1 for Improved Autoregressive Modeling with Distribution Smoothing
Figure 2 for Improved Autoregressive Modeling with Distribution Smoothing
Figure 3 for Improved Autoregressive Modeling with Distribution Smoothing
Figure 4 for Improved Autoregressive Modeling with Distribution Smoothing

While autoregressive models excel at image compression, their sample quality is often lacking. Although not realistic, generated images often have high likelihood according to the model, resembling the case of adversarial examples. Inspired by a successful adversarial defense method, we incorporate randomized smoothing into autoregressive generative modeling. We first model a smoothed version of the data distribution, and then reverse the smoothing process to recover the original data distribution. This procedure drastically improves the sample quality of existing autoregressive models on several synthetic and real-world image datasets while obtaining competitive likelihoods on synthetic datasets.

* ICLR 2021 (Oral) 
Viaarxiv icon

Localized Calibration: Metrics and Recalibration

Feb 22, 2021
Rachel Luo, Aadyot Bhatnagar, Huan Wang, Caiming Xiong, Silvio Savarese, Yu Bai, Shengjia Zhao, Stefano Ermon

Figure 1 for Localized Calibration: Metrics and Recalibration
Figure 2 for Localized Calibration: Metrics and Recalibration
Figure 3 for Localized Calibration: Metrics and Recalibration
Figure 4 for Localized Calibration: Metrics and Recalibration

Probabilistic classifiers output confidence scores along with their predictions, and these confidence scores must be well-calibrated (i.e. reflect the true probability of an event) to be meaningful and useful for downstream tasks. However, existing metrics for measuring calibration are insufficient. Commonly used metrics such as the expected calibration error (ECE) only measure global trends, making them ineffective for measuring the calibration of a particular sample or subgroup. At the other end of the spectrum, a fully individualized calibration error is in general intractable to estimate from finite samples. In this work, we propose the local calibration error (LCE), a fine-grained calibration metric that spans the gap between fully global and fully individualized calibration. The LCE leverages learned features to automatically capture rich subgroups, and it measures the calibration error around each individual example via a similarity function. We then introduce a localized recalibration method, LoRe, that improves the LCE better than existing recalibration methods. Finally, we show that applying our recalibration method improves decision-making on downstream tasks.

Viaarxiv icon

Right Decisions from Wrong Predictions: A Mechanism Design Alternative to Individual Calibration

Nov 15, 2020
Shengjia Zhao, Stefano Ermon

Figure 1 for Right Decisions from Wrong Predictions: A Mechanism Design Alternative to Individual Calibration
Figure 2 for Right Decisions from Wrong Predictions: A Mechanism Design Alternative to Individual Calibration
Figure 3 for Right Decisions from Wrong Predictions: A Mechanism Design Alternative to Individual Calibration
Figure 4 for Right Decisions from Wrong Predictions: A Mechanism Design Alternative to Individual Calibration

Decision makers often need to rely on imperfect probabilistic forecasts. While average performance metrics are typically available, it is difficult to assess the quality of individual forecasts and the corresponding utilities. To convey confidence about individual predictions to decision-makers, we propose a compensation mechanism ensuring that the forecasted utility matches the actually accrued utility. While a naive scheme to compensate decision-makers for prediction errors can be exploited and might not be sustainable in the long run, we propose a mechanism based on fair bets and online learning that provably cannot be exploited. We demonstrate an application showing how passengers could confidently optimize individual travel plans based on flight delay probabilities estimated by an airline.

Viaarxiv icon

Privacy Preserving Recalibration under Domain Shift

Aug 21, 2020
Rachel Luo, Shengjia Zhao, Jiaming Song, Jonathan Kuck, Stefano Ermon, Silvio Savarese

Figure 1 for Privacy Preserving Recalibration under Domain Shift
Figure 2 for Privacy Preserving Recalibration under Domain Shift
Figure 3 for Privacy Preserving Recalibration under Domain Shift
Figure 4 for Privacy Preserving Recalibration under Domain Shift

Classifiers deployed in high-stakes real-world applications must output calibrated confidence scores, i.e. their predicted probabilities should reflect empirical frequencies. Recalibration algorithms can greatly improve a model's probability estimates; however, existing algorithms are not applicable in real-world situations where the test data follows a different distribution from the training data, and privacy preservation is paramount (e.g. protecting patient records). We introduce a framework that abstracts out the properties of recalibration problems under differential privacy constraints. This framework allows us to adapt existing recalibration algorithms to satisfy differential privacy while remaining effective for domain-shift situations. Guided by our framework, we also design a novel recalibration algorithm, accuracy temperature scaling, that outperforms prior work on private datasets. In an extensive empirical study, we find that our algorithm improves calibration on domain-shift benchmarks under the constraints of differential privacy. On the 15 highest severity perturbations of the ImageNet-C dataset, our method achieves a median ECE of 0.029, over 2x better than the next best recalibration method and almost 5x better than without recalibration.

Viaarxiv icon