Alert button
Picture for Kayhan Batmanghelich

Kayhan Batmanghelich

Alert button

Dividing and Conquering a BlackBox to a Mixture of Interpretable Models: Route, Interpret, Repeat

Jul 12, 2023
Shantanu Ghosh, Ke Yu, Forough Arabshahi, Kayhan Batmanghelich

Figure 1 for Dividing and Conquering a BlackBox to a Mixture of Interpretable Models: Route, Interpret, Repeat
Figure 2 for Dividing and Conquering a BlackBox to a Mixture of Interpretable Models: Route, Interpret, Repeat
Figure 3 for Dividing and Conquering a BlackBox to a Mixture of Interpretable Models: Route, Interpret, Repeat
Figure 4 for Dividing and Conquering a BlackBox to a Mixture of Interpretable Models: Route, Interpret, Repeat

ML model design either starts with an interpretable model or a Blackbox and explains it post hoc. Blackbox models are flexible but difficult to explain, while interpretable models are inherently explainable. Yet, interpretable models require extensive ML knowledge and tend to be less flexible and underperforming than their Blackbox variants. This paper aims to blur the distinction between a post hoc explanation of a Blackbox and constructing interpretable models. Beginning with a Blackbox, we iteratively carve out a mixture of interpretable experts (MoIE) and a residual network. Each interpretable model specializes in a subset of samples and explains them using First Order Logic (FOL), providing basic reasoning on concepts from the Blackbox. We route the remaining samples through a flexible residual. We repeat the method on the residual network until all the interpretable models explain the desired proportion of data. Our extensive experiments show that our route, interpret, and repeat approach (1) identifies a diverse set of instance-specific concepts with high concept completeness via MoIE without compromising in performance, (2) identifies the relatively ``harder'' samples to explain via residuals, (3) outperforms the interpretable by-design models by significant margins during test-time interventions, and (4) fixes the shortcut learned by the original Blackbox. The code for MoIE is publicly available at: \url{https://github.com/batmanlab/ICML-2023-Route-interpret-repeat}

* Proceedings of the 40th International Conference on Machine Learning, PMLR 202:11360-11397, 2023  
* appeared as v5 of arXiv:2302.10289 which was replaced in error, which drifted into a different work, accepted in ICML 2023 
Viaarxiv icon

Exploring the Lottery Ticket Hypothesis with Explainability Methods: Insights into Sparse Network Performance

Jul 07, 2023
Shantanu Ghosh, Kayhan Batmanghelich

Figure 1 for Exploring the Lottery Ticket Hypothesis with Explainability Methods: Insights into Sparse Network Performance
Figure 2 for Exploring the Lottery Ticket Hypothesis with Explainability Methods: Insights into Sparse Network Performance
Figure 3 for Exploring the Lottery Ticket Hypothesis with Explainability Methods: Insights into Sparse Network Performance
Figure 4 for Exploring the Lottery Ticket Hypothesis with Explainability Methods: Insights into Sparse Network Performance

Discovering a high-performing sparse network within a massive neural network is advantageous for deploying them on devices with limited storage, such as mobile phones. Additionally, model explainability is essential to fostering trust in AI. The Lottery Ticket Hypothesis (LTH) finds a network within a deep network with comparable or superior performance to the original model. However, limited study has been conducted on the success or failure of LTH in terms of explainability. In this work, we examine why the performance of the pruned networks gradually increases or decreases. Using Grad-CAM and Post-hoc concept bottleneck models (PCBMs), respectively, we investigate the explainability of pruned networks in terms of pixels and high-level concepts. We perform extensive experiments across vision and medical imaging datasets. As more weights are pruned, the performance of the network degrades. The discovered concepts and pixels from the pruned networks are inconsistent with the original network -- a possible reason for the drop in performance.

Viaarxiv icon

Distilling BlackBox to Interpretable models for Efficient Transfer Learning

Jun 10, 2023
Shantanu Ghosh, Ke Yu, Kayhan Batmanghelich

Figure 1 for Distilling BlackBox to Interpretable models for Efficient Transfer Learning
Figure 2 for Distilling BlackBox to Interpretable models for Efficient Transfer Learning
Figure 3 for Distilling BlackBox to Interpretable models for Efficient Transfer Learning
Figure 4 for Distilling BlackBox to Interpretable models for Efficient Transfer Learning

Building generalizable AI models is one of the primary challenges in the healthcare domain. While radiologists rely on generalizable descriptive rules of abnormality, Neural Network (NN) models suffer even with a slight shift in input distribution (e.g., scanner type). Fine-tuning a model to transfer knowledge from one domain to another requires a significant amount of labeled data in the target domain. In this paper, we develop an interpretable model that can be efficiently fine-tuned to an unseen target domain with minimal computational cost. We assume the interpretable component of NN to be approximately domain-invariant. However, interpretable models typically underperform compared to their Blackbox (BB) variants. We start with a BB in the source domain and distill it into a \emph{mixture} of shallow interpretable models using human-understandable concepts. As each interpretable model covers a subset of data, a mixture of interpretable models achieves comparable performance as BB. Further, we use the pseudo-labeling technique from semi-supervised learning (SSL) to learn the concept classifier in the target domain, followed by fine-tuning the interpretable models in the target domain. We evaluate our model using a real-life large-scale chest-X-ray (CXR) classification dataset. The code is available at: \url{https://github.com/batmanlab/MICCAI-2023-Route-interpret-repeat-CXRs}.

* MICCAI, 2023, Early accept 
Viaarxiv icon

From Characters to Words: Hierarchical Pre-trained Language Model for Open-vocabulary Language Understanding

May 23, 2023
Li Sun, Florian Luisier, Kayhan Batmanghelich, Dinei Florencio, Cha Zhang

Figure 1 for From Characters to Words: Hierarchical Pre-trained Language Model for Open-vocabulary Language Understanding
Figure 2 for From Characters to Words: Hierarchical Pre-trained Language Model for Open-vocabulary Language Understanding
Figure 3 for From Characters to Words: Hierarchical Pre-trained Language Model for Open-vocabulary Language Understanding
Figure 4 for From Characters to Words: Hierarchical Pre-trained Language Model for Open-vocabulary Language Understanding

Current state-of-the-art models for natural language understanding require a preprocessing step to convert raw text into discrete tokens. This process known as tokenization relies on a pre-built vocabulary of words or sub-word morphemes. This fixed vocabulary limits the model's robustness to spelling errors and its capacity to adapt to new domains. In this work, we introduce a novel open-vocabulary language model that adopts a hierarchical two-level approach: one at the word level and another at the sequence level. Concretely, we design an intra-word module that uses a shallow Transformer architecture to learn word representations from their characters, and a deep inter-word Transformer module that contextualizes each word representation by attending to the entire word sequence. Our model thus directly operates on character sequences with explicit awareness of word boundaries, but without biased sub-word or word-level vocabulary. Experiments on various downstream tasks show that our method outperforms strong baselines. We also demonstrate that our hierarchical model is robust to textual corruption and domain shift.

* Accepted to ACL 2023 Main Conference 
Viaarxiv icon

DrasCLR: A Self-supervised Framework of Learning Disease-related and Anatomy-specific Representation for 3D Medical Images

Mar 15, 2023
Ke Yu, Li Sun, Junxiang Chen, Max Reynolds, Tigmanshu Chaudhary, Kayhan Batmanghelich

Figure 1 for DrasCLR: A Self-supervised Framework of Learning Disease-related and Anatomy-specific Representation for 3D Medical Images
Figure 2 for DrasCLR: A Self-supervised Framework of Learning Disease-related and Anatomy-specific Representation for 3D Medical Images
Figure 3 for DrasCLR: A Self-supervised Framework of Learning Disease-related and Anatomy-specific Representation for 3D Medical Images
Figure 4 for DrasCLR: A Self-supervised Framework of Learning Disease-related and Anatomy-specific Representation for 3D Medical Images

Large-scale volumetric medical images with annotation are rare, costly, and time prohibitive to acquire. Self-supervised learning (SSL) offers a promising pre-training and feature extraction solution for many downstream tasks, as it only uses unlabeled data. Recently, SSL methods based on instance discrimination have gained popularity in the medical imaging domain. However, SSL pre-trained encoders may use many clues in the image to discriminate an instance that are not necessarily disease-related. Moreover, pathological patterns are often subtle and heterogeneous, requiring the ability of the desired method to represent anatomy-specific features that are sensitive to abnormal changes in different body parts. In this work, we present a novel SSL framework, named DrasCLR, for 3D medical imaging to overcome these challenges. We propose two domain-specific contrastive learning strategies: one aims to capture subtle disease patterns inside a local anatomical region, and the other aims to represent severe disease patterns that span larger regions. We formulate the encoder using conditional hyper-parameterized network, in which the parameters are dependant on the anatomical location, to extract anatomically sensitive features. Extensive experiments on large-scale computer tomography (CT) datasets of lung images show that our method improves the performance of many downstream prediction and segmentation tasks. The patient-level representation improves the performance of the patient survival prediction task. We show how our method can detect emphysema subtypes via dense prediction. We demonstrate that fine-tuning the pre-trained model can significantly reduce annotation efforts without sacrificing emphysema detection accuracy. Our ablation study highlights the importance of incorporating anatomical context into the SSL framework.

* Added some recent references 
Viaarxiv icon

Route, Interpret, Repeat: Blurring the Line Between Post hoc Explainability and Interpretable Models

Feb 20, 2023
Shantanu Ghosh, Ke Yu, Forough Arabshahi, Kayhan Batmanghelich

Figure 1 for Route, Interpret, Repeat: Blurring the Line Between Post hoc Explainability and Interpretable Models
Figure 2 for Route, Interpret, Repeat: Blurring the Line Between Post hoc Explainability and Interpretable Models
Figure 3 for Route, Interpret, Repeat: Blurring the Line Between Post hoc Explainability and Interpretable Models
Figure 4 for Route, Interpret, Repeat: Blurring the Line Between Post hoc Explainability and Interpretable Models

The current approach to ML model design is either to choose a flexible Blackbox model and explain it post hoc or to start with an interpretable model. Blackbox models are flexible but difficult to explain, whereas interpretable models are designed to be explainable. However, developing interpretable models necessitates extensive ML knowledge, and the resulting models tend to be less flexible, offering potentially subpar performance compared to their Blackbox equivalents. This paper aims to blur the distinction between a post hoc explanation of a BlackBox and constructing interpretable models. We propose beginning with a flexible BlackBox model and gradually \emph{carving out} a mixture of interpretable models and a \emph{residual network}. Our design identifies a subset of samples and \emph{routes} them through the interpretable models. The remaining samples are routed through a flexible residual network. We adopt First Order Logic (FOL) as the interpretable model's backbone, which provides basic reasoning on concepts retrieved from the BlackBox model. On the residual network, we repeat the method until the proportion of data explained by the residual network falls below a desired threshold. Our approach offers several advantages. First, the mixture of interpretable and flexible residual networks results in almost no compromise in performance. Second, the route, interpret, and repeat approach yields a highly flexible interpretable model. Our extensive experiment demonstrates the performance of the model on various datasets. We show that by editing the FOL model, we can fix the shortcut learned by the original BlackBox model. Finally, our method provides a framework for a hybrid symbolic-connectionist network that is simple to train and adaptable to many applications.

Viaarxiv icon

Shortcut Learning Through the Lens of Early Training Dynamics

Feb 18, 2023
Nihal Murali, Aahlad Manas Puli, Ke Yu, Rajesh Ranganath, Kayhan Batmanghelich

Figure 1 for Shortcut Learning Through the Lens of Early Training Dynamics
Figure 2 for Shortcut Learning Through the Lens of Early Training Dynamics
Figure 3 for Shortcut Learning Through the Lens of Early Training Dynamics
Figure 4 for Shortcut Learning Through the Lens of Early Training Dynamics

Deep Neural Networks (DNNs) are prone to learn shortcut patterns that damage the generalization of the DNN during deployment. Shortcut Learning is concerning, particularly when the DNNs are applied to safety-critical domains. This paper aims to better understand shortcut learning through the lens of the learning dynamics of the internal neurons during the training process. More specifically, we make the following observations: (1) While previous works treat shortcuts as synonymous with spurious correlations, we emphasize that not all spurious correlations are shortcuts. We show that shortcuts are only those spurious features that are "easier" than the core features. (2) We build upon this premise and use instance difficulty methods (like Prediction Depth) to quantify "easy" and to identify this behavior during the training phase. (3) We empirically show that shortcut learning can be detected by observing the learning dynamics of the DNN's early layers, irrespective of the network architecture used. In other words, easy features learned by the initial layers of a DNN early during the training are potential shortcuts. We verify our claims on simulated and real medical imaging data and justify the empirical success of our hypothesis by showing the theoretical connections between Prediction Depth and information-theoretic concepts like V-usable information. Lastly, our experiments show the insufficiency of monitoring only accuracy plots during training (as is common in machine learning pipelines), and we highlight the need for monitoring early training dynamics using example difficulty metrics.

* Main paper: 10 pages and 8 figures. Supplementary: 6 pages and 6 figures. Preprint. Under review 
Viaarxiv icon

Context-aware Self-supervised Learning for Medical Images Using Graph Neural Network

Jul 06, 2022
Li Sun, Ke Yu, Kayhan Batmanghelich

Figure 1 for Context-aware Self-supervised Learning for Medical Images Using Graph Neural Network
Figure 2 for Context-aware Self-supervised Learning for Medical Images Using Graph Neural Network
Figure 3 for Context-aware Self-supervised Learning for Medical Images Using Graph Neural Network

Although self-supervised learning enables us to bootstrap the training by exploiting unlabeled data, the generic self-supervised methods for natural images do not sufficiently incorporate the context. For medical images, a desirable method should be sensitive enough to detect deviation from normal-appearing tissue of each anatomical region; here, anatomy is the context. We introduce a novel approach with two levels of self-supervised representation learning objectives: one on the regional anatomical level and another on the patient-level. We use graph neural networks to incorporate the relationship between different anatomical regions. The structure of the graph is informed by anatomical correspondences between each patient and an anatomical atlas. In addition, the graph representation has the advantage of handling any arbitrarily sized image in full resolution. Experiments on large-scale Computer Tomography (CT) datasets of lung images show that our approach compares favorably to baseline methods that do not account for the context. We use the learned embedding for staging lung tissue abnormalities related to COVID-19.

* Accepted by NeurIPS workshop 2020. arXiv admin note: substantial text overlap with arXiv:2012.06457 
Viaarxiv icon

Adversarial Consistency for Single Domain Generalization in Medical Image Segmentation

Jun 29, 2022
Yanwu Xu, Shaoan Xie, Maxwell Reynolds, Matthew Ragoza, Mingming Gong, Kayhan Batmanghelich

Figure 1 for Adversarial Consistency for Single Domain Generalization in Medical Image Segmentation
Figure 2 for Adversarial Consistency for Single Domain Generalization in Medical Image Segmentation
Figure 3 for Adversarial Consistency for Single Domain Generalization in Medical Image Segmentation
Figure 4 for Adversarial Consistency for Single Domain Generalization in Medical Image Segmentation

An organ segmentation method that can generalize to unseen contrasts and scanner settings can significantly reduce the need for retraining of deep learning models. Domain Generalization (DG) aims to achieve this goal. However, most DG methods for segmentation require training data from multiple domains during training. We propose a novel adversarial domain generalization method for organ segmentation trained on data from a \emph{single} domain. We synthesize the new domains via learning an adversarial domain synthesizer (ADS) and presume that the synthetic domains cover a large enough area of plausible distributions so that unseen domains can be interpolated from synthetic domains. We propose a mutual information regularizer to enforce the semantic consistency between images from the synthetic domains, which can be estimated by patch-level contrastive learning. We evaluate our method for various organ segmentation for unseen modalities, scanning protocols, and scanner sites.

* MICCAI2022 accpted 
Viaarxiv icon