Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Debarghya Ghoshdastidar

LRR-TUM

Decision Trees for Interpretable Clusters in Mixture Models and Deep Representations

Nov 03, 2024

Maximilian Fleissner, Maedeh Zarvandi, Debarghya Ghoshdastidar

Figure 1 for Decision Trees for Interpretable Clusters in Mixture Models and Deep Representations

Figure 2 for Decision Trees for Interpretable Clusters in Mixture Models and Deep Representations

Figure 3 for Decision Trees for Interpretable Clusters in Mixture Models and Deep Representations

Figure 4 for Decision Trees for Interpretable Clusters in Mixture Models and Deep Representations

Abstract:Decision Trees are one of the backbones of explainable machine learning, and often serve as interpretable alternatives to black-box models. Traditionally utilized in the supervised setting, there has recently also been a surge of interest in decision trees for unsupervised learning. While several works with worst-case guarantees on the clustering cost have appeared, these results are distribution-agnostic, and do not give insight into when decision trees can actually recover the underlying distribution of the data (up to some small error). In this paper, we therefore introduce the notion of an explainability-to-noise ratio for mixture models, formalizing the intuition that well-clustered data can indeed be explained well using a decision tree. We propose an algorithm that takes as input a mixture model and constructs a suitable tree in data-independent time. Assuming sub-Gaussianity of the mixture components, we prove upper and lower bounds on the error rate of the resulting decision tree. In addition, we demonstrate how concept activation vectors can be used to extend explainable clustering to neural networks. We empirically demonstrate the efficacy of our approach on standard tabular and image datasets.

Via

Access Paper or Ask Questions

Provable Robustness of (Graph) Neural Networks Against Data Poisoning and Backdoor Attacks

Jul 15, 2024

Lukas Gosch, Mahalakshmi Sabanayagam, Debarghya Ghoshdastidar, Stephan Günnemann

Figure 1 for Provable Robustness of (Graph) Neural Networks Against Data Poisoning and Backdoor Attacks

Figure 2 for Provable Robustness of (Graph) Neural Networks Against Data Poisoning and Backdoor Attacks

Figure 3 for Provable Robustness of (Graph) Neural Networks Against Data Poisoning and Backdoor Attacks

Figure 4 for Provable Robustness of (Graph) Neural Networks Against Data Poisoning and Backdoor Attacks

Abstract:Generalization of machine learning models can be severely compromised by data poisoning, where adversarial changes are applied to the training data, as well as backdoor attacks that additionally manipulate the test data. These vulnerabilities have led to interest in certifying (i.e., proving) that such changes up to a certain magnitude do not affect test predictions. We, for the first time, certify Graph Neural Networks (GNNs) against poisoning and backdoor attacks targeting the node features of a given graph. Our certificates are white-box and based upon $(i)$ the neural tangent kernel, which characterizes the training dynamics of sufficiently wide networks; and $(ii)$ a novel reformulation of the bilevel optimization problem describing poisoning as a mixed-integer linear program. Consequently, we leverage our framework to provide fundamental insights into the role of graph structure and its connectivity on the worst-case robustness behavior of convolution-based and PageRank-based GNNs. We note that our framework is more general and constitutes the first approach to derive white-box poisoning certificates for NNs, which can be of independent interest beyond graph-related tasks.

Via

Access Paper or Ask Questions

When can we Approximate Wide Contrastive Models with Neural Tangent Kernels and Principal Component Analysis?

Mar 13, 2024

Gautham Govind Anil, Pascal Esser, Debarghya Ghoshdastidar

Figure 1 for When can we Approximate Wide Contrastive Models with Neural Tangent Kernels and Principal Component Analysis?

Figure 2 for When can we Approximate Wide Contrastive Models with Neural Tangent Kernels and Principal Component Analysis?

Figure 3 for When can we Approximate Wide Contrastive Models with Neural Tangent Kernels and Principal Component Analysis?

Figure 4 for When can we Approximate Wide Contrastive Models with Neural Tangent Kernels and Principal Component Analysis?

Abstract:Contrastive learning is a paradigm for learning representations from unlabelled data that has been highly successful for image and text data. Several recent works have examined contrastive losses to claim that contrastive models effectively learn spectral embeddings, while few works show relations between (wide) contrastive models and kernel principal component analysis (PCA). However, it is not known if trained contrastive models indeed correspond to kernel methods or PCA. In this work, we analyze the training dynamics of two-layer contrastive models, with non-linear activation, and answer when these models are close to PCA or kernel methods. It is well known in the supervised setting that neural networks are equivalent to neural tangent kernel (NTK) machines, and that the NTK of infinitely wide networks remains constant during training. We provide the first convergence results of NTK for contrastive losses, and present a nuanced picture: NTK of wide networks remains almost constant for cosine similarity based contrastive losses, but not for losses based on dot product similarity. We further study the training dynamics of contrastive models with orthogonality constraints on output layer, which is implicitly assumed in works relating contrastive learning to spectral embedding. Our deviation bounds suggest that representations learned by contrastive models are close to the principal components of a certain matrix computed from random features. We empirically show that our theoretical results possibly hold beyond two-layer networks.

Via

Access Paper or Ask Questions

On the Stability of Gradient Descent for Large Learning Rate

Feb 20, 2024

Alexandru Crăciun, Debarghya Ghoshdastidar

Figure 1 for On the Stability of Gradient Descent for Large Learning Rate

Figure 2 for On the Stability of Gradient Descent for Large Learning Rate

Figure 3 for On the Stability of Gradient Descent for Large Learning Rate

Figure 4 for On the Stability of Gradient Descent for Large Learning Rate

Abstract:There currently is a significant interest in understanding the Edge of Stability (EoS) phenomenon, which has been observed in neural networks training, characterized by a non-monotonic decrease of the loss function over epochs, while the sharpness of the loss (spectral norm of the Hessian) progressively approaches and stabilizes around 2/(learning rate). Reasons for the existence of EoS when training using gradient descent have recently been proposed -- a lack of flat minima near the gradient descent trajectory together with the presence of compact forward-invariant sets. In this paper, we show that linear neural networks optimized under a quadratic loss function satisfy the first assumption and also a necessary condition for the second assumption. More precisely, we prove that the gradient descent map is non-singular, the set of global minimizers of the loss function forms a smooth manifold, and the stable minima form a bounded subset in parameter space. Additionally, we prove that if the step-size is too big, then the set of initializations from which gradient descent converges to a critical point has measure zero.

Via

Access Paper or Ask Questions

Explaining Kernel Clustering via Decision Trees

Feb 15, 2024

Maximilian Fleissner, Leena Chennuru Vankadara, Debarghya Ghoshdastidar

Abstract:Despite the growing popularity of explainable and interpretable machine learning, there is still surprisingly limited work on inherently interpretable clustering methods. Recently, there has been a surge of interest in explaining the classic k-means algorithm, leading to efficient algorithms that approximate k-means clusters using axis-aligned decision trees. However, interpretable variants of k-means have limited applicability in practice, where more flexible clustering methods are often needed to obtain useful partitions of the data. In this work, we investigate interpretable kernel clustering, and propose algorithms that construct decision trees to approximate the partitions induced by kernel k-means, a nonlinear extension of k-means. We further build on previous work on explainable k-means and demonstrate how a suitable choice of features allows preserving interpretability without sacrificing approximation guarantees on the interpretable model.

Via

Access Paper or Ask Questions

Representation Learning Dynamics of Self-Supervised Models

Sep 05, 2023

Pascal Esser, Satyaki Mukherjee, Debarghya Ghoshdastidar

Figure 1 for Representation Learning Dynamics of Self-Supervised Models

Figure 2 for Representation Learning Dynamics of Self-Supervised Models

Figure 3 for Representation Learning Dynamics of Self-Supervised Models

Figure 4 for Representation Learning Dynamics of Self-Supervised Models

Abstract:Self-Supervised Learning (SSL) is an important paradigm for learning representations from unlabelled data, and SSL with neural networks has been highly successful in practice. However current theoretical analysis of SSL is mostly restricted to generalisation error bounds. In contrast, learning dynamics often provide a precise characterisation of the behaviour of neural networks based models but, so far, are mainly known in supervised settings. In this paper, we study the learning dynamics of SSL models, specifically representations obtained by minimising contrastive and non-contrastive losses. We show that a naive extension of the dymanics of multivariate regression to SSL leads to learning trivial scalar representations that demonstrates dimension collapse in SSL. Consequently, we formulate SSL objectives with orthogonality constraints on the weights, and derive the exact (network width independent) learning dynamics of the SSL models trained using gradient descent on the Grassmannian manifold. We also argue that the infinite width approximation of SSL models significantly deviate from the neural tangent kernel approximations of supervised models. We numerically illustrate the validity of our theoretical findings, and discuss how the presented results provide a framework for further theoretical analysis of contrastive and non-contrastive SSL.

Via

Access Paper or Ask Questions

Non-Parametric Representation Learning with Kernels

Sep 05, 2023

Pascal Esser, Maximilian Fleissner, Debarghya Ghoshdastidar

Figure 1 for Non-Parametric Representation Learning with Kernels

Figure 2 for Non-Parametric Representation Learning with Kernels

Figure 3 for Non-Parametric Representation Learning with Kernels

Figure 4 for Non-Parametric Representation Learning with Kernels

Abstract:Unsupervised and self-supervised representation learning has become popular in recent years for learning useful features from unlabelled data. Representation learning has been mostly developed in the neural network literature, and other models for representation learning are surprisingly unexplored. In this work, we introduce and analyze several kernel-based representation learning approaches: Firstly, we define two kernel Self-Supervised Learning (SSL) models using contrastive loss functions and secondly, a Kernel Autoencoder (AE) model based on the idea of embedding and reconstructing data. We argue that the classical representer theorems for supervised kernel machines are not always applicable for (self-supervised) representation learning, and present new representer theorems, which show that the representations learned by our kernel models can be expressed in terms of kernel matrices. We further derive generalisation error bounds for representation learning with kernel SSL and AE, and empirically evaluate the performance of these methods in both small data regimes as well as in comparison with neural network based models.

Via

Access Paper or Ask Questions

Fast Adaptive Test-Time Defense with Robust Features

Jul 21, 2023

Anurag Singh, Mahalakshmi Sabanayagam, Krikamol Muandet, Debarghya Ghoshdastidar

Figure 1 for Fast Adaptive Test-Time Defense with Robust Features

Figure 2 for Fast Adaptive Test-Time Defense with Robust Features

Figure 3 for Fast Adaptive Test-Time Defense with Robust Features

Figure 4 for Fast Adaptive Test-Time Defense with Robust Features

Abstract:Adaptive test-time defenses are used to improve the robustness of deep neural networks to adversarial examples. However, existing methods significantly increase the inference time due to additional optimization on the model parameters or the input at test time. In this work, we propose a novel adaptive test-time defense strategy that is easy to integrate with any existing (robust) training procedure without additional test-time computation. Based on the notion of robustness of features that we present, the key idea is to project the trained models to the most robust feature space, thereby reducing the vulnerability to adversarial attacks in non-robust directions. We theoretically show that the top eigenspace of the feature matrix are more robust for a generalized additive model and support our argument for a large width neural network with the Neural Tangent Kernel (NTK) equivalence. We conduct extensive experiments on CIFAR-10 and CIFAR-100 datasets for several robustness benchmarks, including the state-of-the-art methods in RobustBench, and observe that the proposed method outperforms existing adaptive test-time defenses at much lower computation costs.

Via

Access Paper or Ask Questions

Wasserstein Projection Pursuit of Non-Gaussian Signals

Feb 24, 2023

Satyaki Mukherjee, Soumendu Sundar Mukherjee, Debarghya Ghoshdastidar

Abstract:We consider the general dimensionality reduction problem of locating in a high-dimensional data cloud, a $k$-dimensional non-Gaussian subspace of interesting features. We use a projection pursuit approach -- we search for mutually orthogonal unit directions which maximise the 2-Wasserstein distance of the empirical distribution of data-projections along these directions from a standard Gaussian. Under a generative model, where there is a underlying (unknown) low-dimensional non-Gaussian subspace, we prove rigorous statistical guarantees on the accuracy of approximating this unknown subspace by the directions found by our projection pursuit approach. Our results operate in the regime where the data dimensionality is comparable to the sample size, and thus supplement the recent literature on the non-feasibility of locating interesting directions via projection pursuit in the complementary regime where the data dimensionality is much larger than the sample size.

Via

Access Paper or Ask Questions

Improved Representation Learning Through Tensorized Autoencoders

Dec 02, 2022

Pascal Mattia Esser, Satyaki Mukherjee, Mahalakshmi Sabanayagam, Debarghya Ghoshdastidar

Figure 1 for Improved Representation Learning Through Tensorized Autoencoders

Figure 2 for Improved Representation Learning Through Tensorized Autoencoders

Figure 3 for Improved Representation Learning Through Tensorized Autoencoders

Figure 4 for Improved Representation Learning Through Tensorized Autoencoders

Abstract:The central question in representation learning is what constitutes a good or meaningful representation. In this work we argue that if we consider data with inherent cluster structures, where clusters can be characterized through different means and covariances, those data structures should be represented in the embedding as well. While Autoencoders (AE) are widely used in practice for unsupervised representation learning, they do not fulfil the above condition on the embedding as they obtain a single representation of the data. To overcome this we propose a meta-algorithm that can be used to extend an arbitrary AE architecture to a tensorized version (TAE) that allows for learning cluster-specific embeddings while simultaneously learning the cluster assignment. For the linear setting we prove that TAE can recover the principle components of the different clusters in contrast to principle component of the entire data recovered by a standard AE. We validated this on planted models and for general, non-linear and convolutional AEs we empirically illustrate that tensorizing the AE is beneficial in clustering and de-noising tasks.

Via

Access Paper or Ask Questions