Alert button
Picture for Yuekai Sun

Yuekai Sun

Alert button

Conditional independence testing under model misspecification

Jul 05, 2023
Felipe Maia Polo, Yuekai Sun, Moulinath Banerjee

Conditional independence (CI) testing is fundamental and challenging in modern statistics and machine learning. Many modern methods for CI testing rely on powerful supervised learning methods to learn regression functions or Bayes predictors as an intermediate step. Although the methods are guaranteed to control Type-I error when the supervised learning methods accurately estimate the regression functions or Bayes predictors, their behavior is less understood when they fail due to model misspecification. In a broader sense, model misspecification can arise even when universal approximators (e.g., deep neural nets) are employed. Then, we study the performance of regression-based CI tests under model misspecification. Namely, we propose new approximations or upper bounds for the testing errors of three regression-based tests that depend on misspecification errors. Moreover, we introduce the Rao-Blackwellized Predictor Test (RBPT), a novel regression-based CI test robust against model misspecification. Finally, we conduct experiments with artificial and real data, showcasing the usefulness of our theory and methods.

Viaarxiv icon

ISAAC Newton: Input-based Approximate Curvature for Newton's Method

May 01, 2023
Felix Petersen, Tobias Sutter, Christian Borgelt, Dongsung Huh, Hilde Kuehne, Yuekai Sun, Oliver Deussen

Figure 1 for ISAAC Newton: Input-based Approximate Curvature for Newton's Method
Figure 2 for ISAAC Newton: Input-based Approximate Curvature for Newton's Method
Figure 3 for ISAAC Newton: Input-based Approximate Curvature for Newton's Method
Figure 4 for ISAAC Newton: Input-based Approximate Curvature for Newton's Method

We present ISAAC (Input-baSed ApproximAte Curvature), a novel method that conditions the gradient using selected second-order information and has an asymptotically vanishing computational overhead, assuming a batch size smaller than the number of neurons. We show that it is possible to compute a good conditioner based on only the input to a respective layer without a substantial computational overhead. The proposed method allows effective training even in small-batch stochastic regimes, which makes it competitive to first-order as well as second-order methods.

* Published at ICLR 2023, Code @ https://github.com/Felix-Petersen/isaac, Video @ https://youtu.be/7RKRX-MdwqM 
Viaarxiv icon

Simple Disentanglement of Style and Content in Visual Representations

Feb 20, 2023
Lilian Ngweta, Subha Maity, Alex Gittens, Yuekai Sun, Mikhail Yurochkin

Figure 1 for Simple Disentanglement of Style and Content in Visual Representations
Figure 2 for Simple Disentanglement of Style and Content in Visual Representations
Figure 3 for Simple Disentanglement of Style and Content in Visual Representations
Figure 4 for Simple Disentanglement of Style and Content in Visual Representations

Learning visual representations with interpretable features, i.e., disentangled representations, remains a challenging problem. Existing methods demonstrate some success but are hard to apply to large-scale vision datasets like ImageNet. In this work, we propose a simple post-processing framework to disentangle content and style in learned representations from pre-trained vision models. We model the pre-trained features probabilistically as linearly entangled combinations of the latent content and style factors and develop a simple disentanglement algorithm based on the probabilistic model. We show that the method provably disentangles content and style features and verify its efficacy empirically. Our post-processed features yield significant domain generalization performance improvements when the distribution shift occurs due to style changes or style-related spurious correlations.

Viaarxiv icon

Calibrated Data-Dependent Constraints with Exact Satisfaction Guarantees

Jan 15, 2023
Songkai Xue, Yuekai Sun, Mikhail Yurochkin

Figure 1 for Calibrated Data-Dependent Constraints with Exact Satisfaction Guarantees
Figure 2 for Calibrated Data-Dependent Constraints with Exact Satisfaction Guarantees
Figure 3 for Calibrated Data-Dependent Constraints with Exact Satisfaction Guarantees
Figure 4 for Calibrated Data-Dependent Constraints with Exact Satisfaction Guarantees

We consider the task of training machine learning models with data-dependent constraints. Such constraints often arise as empirical versions of expected value constraints that enforce fairness or stability goals. We reformulate data-dependent constraints so that they are calibrated: enforcing the reformulated constraints guarantees that their expected value counterparts are satisfied with a user-prescribed probability. The resulting optimization problem is amendable to standard stochastic optimization algorithms, and we demonstrate the efficacy of our method on a fairness-sensitive classification task where we wish to guarantee the classifier's fairness (at test time).

* In Proceedings of the 36th Conference on Neural Information Processing Systems (NeurIPS) 2022 
Viaarxiv icon

How does overparametrization affect performance on minority groups?

Jun 07, 2022
Subha Maity, Saptarshi Roy, Songkai Xue, Mikhail Yurochkin, Yuekai Sun

Figure 1 for How does overparametrization affect performance on minority groups?
Figure 2 for How does overparametrization affect performance on minority groups?
Figure 3 for How does overparametrization affect performance on minority groups?
Figure 4 for How does overparametrization affect performance on minority groups?

The benefits of overparameterization for the overall performance of modern machine learning (ML) models are well known. However, the effect of overparameterization at a more granular level of data subgroups is less understood. Recent empirical studies demonstrate encouraging results: (i) when groups are not known, overparameterized models trained with empirical risk minimization (ERM) perform better on minority groups; (ii) when groups are known, ERM on data subsampled to equalize group sizes yields state-of-the-art worst-group-accuracy in the overparameterized regime. In this paper, we complement these empirical studies with a theoretical investigation of the risk of overparameterized random feature models on minority groups. In a setting in which the regression functions for the majority and minority groups are different, we show that overparameterization always improves minority group performance.

Viaarxiv icon

Understanding new tasks through the lens of training data via exponential tilting

May 26, 2022
Subha Maity, Mikhail Yurochkin, Moulinath Banerjee, Yuekai Sun

Figure 1 for Understanding new tasks through the lens of training data via exponential tilting
Figure 2 for Understanding new tasks through the lens of training data via exponential tilting
Figure 3 for Understanding new tasks through the lens of training data via exponential tilting
Figure 4 for Understanding new tasks through the lens of training data via exponential tilting

Deploying machine learning models to new tasks is a major challenge despite the large size of the modern training datasets. However, it is conceivable that the training data can be reweighted to be more representative of the new (target) task. We consider the problem of reweighing the training samples to gain insights into the distribution of the target task. Specifically, we formulate a distribution shift model based on the exponential tilt assumption and learn train data importance weights minimizing the KL divergence between labeled train and unlabeled target datasets. The learned train data weights can then be used for downstream tasks such as target performance evaluation, fine-tuning, and model selection. We demonstrate the efficacy of our method on Waterbirds and Breeds benchmarks.

Viaarxiv icon

Predictor-corrector algorithms for stochastic optimization under gradual distribution shift

May 26, 2022
Subha Maity, Debarghya Mukherjee, Moulinath Banerjee, Yuekai Sun

Figure 1 for Predictor-corrector algorithms for stochastic optimization under gradual distribution shift
Figure 2 for Predictor-corrector algorithms for stochastic optimization under gradual distribution shift

Time-varying stochastic optimization problems frequently arise in machine learning practice (e.g. gradual domain shift, object tracking, strategic classification). Although most problems are solved in discrete time, the underlying process is often continuous in nature. We exploit this underlying continuity by developing predictor-corrector algorithms for time-varying stochastic optimizations. We provide error bounds for the iterates, both in presence of pure and noisy access to the queries from the relevant derivatives of the loss function. Furthermore, we show (theoretically and empirically in several examples) that our method outperforms non-predictor corrector methods that do not exploit the underlying continuous process.

Viaarxiv icon

Domain Adaptation meets Individual Fairness. And they get along

May 01, 2022
Debarghya Mukherjee, Felix Petersen, Mikhail Yurochkin, Yuekai Sun

Figure 1 for Domain Adaptation meets Individual Fairness. And they get along
Figure 2 for Domain Adaptation meets Individual Fairness. And they get along

Many instances of algorithmic bias are caused by distributional shifts. For example, machine learning (ML) models often perform worse on demographic groups that are underrepresented in the training data. In this paper, we leverage this connection between algorithmic fairness and distribution shifts to show that algorithmic fairness interventions can help ML models overcome distribution shifts, and that domain adaptation methods (for overcoming distribution shifts) can mitigate algorithmic biases. In particular, we show that (i) enforcing suitable notions of individual fairness (IF) can improve the out-of-distribution accuracy of ML models, and that (ii) it is possible to adapt representation alignment methods for domain adaptation to enforce (individual) fairness. The former is unexpected because IF interventions were not developed with distribution shifts in mind. The latter is also unexpected because representation alignment is not a common approach in the IF literature.

Viaarxiv icon

Achieving Representative Data via Convex Hull Feasibility Sampling Algorithms

Apr 13, 2022
Laura Niss, Yuekai Sun, Ambuj Tewari

Figure 1 for Achieving Representative Data via Convex Hull Feasibility Sampling Algorithms
Figure 2 for Achieving Representative Data via Convex Hull Feasibility Sampling Algorithms
Figure 3 for Achieving Representative Data via Convex Hull Feasibility Sampling Algorithms
Figure 4 for Achieving Representative Data via Convex Hull Feasibility Sampling Algorithms

Sampling biases in training data are a major source of algorithmic biases in machine learning systems. Although there are many methods that attempt to mitigate such algorithmic biases during training, the most direct and obvious way is simply collecting more representative training data. In this paper, we consider the task of assembling a training dataset in which minority groups are adequately represented from a given set of data sources. In essence, this is an adaptive sampling problem to determine if a given point lies in the convex hull of the means from a set of unknown distributions. We present adaptive sampling methods to determine, with high confidence, whether it is possible to assemble a representative dataset from the given data sources. We also demonstrate the efficacy of our policies in simulations in the Bernoulli and a multinomial setting.

Viaarxiv icon

On sensitivity of meta-learning to support data

Oct 26, 2021
Mayank Agarwal, Mikhail Yurochkin, Yuekai Sun

Figure 1 for On sensitivity of meta-learning to support data
Figure 2 for On sensitivity of meta-learning to support data
Figure 3 for On sensitivity of meta-learning to support data
Figure 4 for On sensitivity of meta-learning to support data

Meta-learning algorithms are widely used for few-shot learning. For example, image recognition systems that readily adapt to unseen classes after seeing only a few labeled examples. Despite their success, we show that modern meta-learning algorithms are extremely sensitive to the data used for adaptation, i.e. support data. In particular, we demonstrate the existence of (unaltered, in-distribution, natural) images that, when used for adaptation, yield accuracy as low as 4\% or as high as 95\% on standard few-shot image classification benchmarks. We explain our empirical findings in terms of class margins, which in turn suggests that robust and safe meta-learning requires larger margins than supervised learning.

* Accepted at NeurIPS 2021 
Viaarxiv icon