Alert button
Picture for Daby Sow

Daby Sow

Alert button

Automated Compliance Blueprint Optimization with Artificial Intelligence

Jun 22, 2022
Abdulhamid Adebayo, Daby Sow, Muhammed Fatih Bulut

Figure 1 for Automated Compliance Blueprint Optimization with Artificial Intelligence
Figure 2 for Automated Compliance Blueprint Optimization with Artificial Intelligence
Figure 3 for Automated Compliance Blueprint Optimization with Artificial Intelligence
Figure 4 for Automated Compliance Blueprint Optimization with Artificial Intelligence

For highly regulated industries such as banking and healthcare, one of the major hindrances to the adoption of cloud computing is compliance with regulatory standards. This is a complex problem due to many regulatory and technical specification (techspec) documents that the companies need to comply with. The critical problem is to establish the mapping between techspecs and regulation controls so that from day one, companies can comply with regulations with minimal effort. We demonstrate the practicality of an approach to automatically analyze regulatory standards using Artificial Intelligence (AI) techniques. We present early results to identify the mapping between techspecs and regulation controls, and discuss challenges that must be overcome for this solution to be fully practical.

* 5 pages 
Viaarxiv icon

Vulnerability Prioritization: An Offensive Security Approach

Jun 22, 2022
Muhammed Fatih Bulut, Abdulhamid Adebayo, Daby Sow, Steve Ocepek

Figure 1 for Vulnerability Prioritization: An Offensive Security Approach
Figure 2 for Vulnerability Prioritization: An Offensive Security Approach
Figure 3 for Vulnerability Prioritization: An Offensive Security Approach
Figure 4 for Vulnerability Prioritization: An Offensive Security Approach

Organizations struggle to handle sheer number of vulnerabilities in their cloud environments. The de facto methodology used for prioritizing vulnerabilities is to use Common Vulnerability Scoring System (CVSS). However, CVSS has inherent limitations that makes it not ideal for prioritization. In this work, we propose a new way of prioritizing vulnerabilities. Our approach is inspired by how offensive security practitioners perform penetration testing. We evaluate our approach with a real world case study for a large client, and the accuracy of machine learning to automate the process end to end.

* 5 pages 
Viaarxiv icon

Attack Techniques and Threat Identification for Vulnerabilities

Jun 22, 2022
Constantin Adam, Muhammed Fatih Bulut, Daby Sow, Steven Ocepek, Chris Bedell, Lilian Ngweta

Figure 1 for Attack Techniques and Threat Identification for Vulnerabilities
Figure 2 for Attack Techniques and Threat Identification for Vulnerabilities
Figure 3 for Attack Techniques and Threat Identification for Vulnerabilities
Figure 4 for Attack Techniques and Threat Identification for Vulnerabilities

Modern organizations struggle with insurmountable number of vulnerabilities that are discovered and reported by their network and application vulnerability scanners. Therefore, prioritization and focus become critical, to spend their limited time on the highest risk vulnerabilities. In doing this, it is important for these organizations not only to understand the technical descriptions of the vulnerabilities, but also to gain insights into attackers' perspectives. In this work, we use machine learning and natural language processing techniques, as well as several publicly available data sets to provide an explainable mapping of vulnerabilities to attack techniques and threat actors. This work provides new security intelligence, by predicting which attack techniques are most likely to be used to exploit a given vulnerability and which threat actors are most likely to conduct the exploitation. Lack of labeled data and different vocabularies make mapping vulnerabilities to attack techniques at scale a challenging problem that cannot be addressed easily using supervised or unsupervised (similarity search) learning techniques. To solve this problem, we first map the vulnerabilities to a standard set of common weaknesses, and then common weaknesses to the attack techniques. This approach yields a Mean Reciprocal Rank (MRR) of 0.95, an accuracy comparable with those reported for state-of-the-art systems. Our solution has been deployed to IBM Security X-Force Red Vulnerability Management Services, and in production since 2021. The solution helps security practitioners to assist customers to manage and prioritize their vulnerabilities, providing them with an explainable mapping of vulnerabilities to attack techniques and threat actors

* 9 pages 
Viaarxiv icon

Blending Knowledge in Deep Recurrent Networks for Adverse Event Prediction at Hospital Discharge

Apr 09, 2021
Prithwish Chakraborty, James Codella, Piyush Madan, Ying Li, Hu Huang, Yoonyoung Park, Chao Yan, Ziqi Zhang, Cheng Gao, Steve Nyemba, Xu Min, Sanjib Basak, Mohamed Ghalwash, Zach Shahn, Parthasararathy Suryanarayanan, Italo Buleje, Shannon Harrer, Sarah Miller, Amol Rajmane, Colin Walsh, Jonathan Wanderer, Gigi Yuen Reed, Kenney Ng, Daby Sow, Bradley A. Malin

Figure 1 for Blending Knowledge in Deep Recurrent Networks for Adverse Event Prediction at Hospital Discharge
Figure 2 for Blending Knowledge in Deep Recurrent Networks for Adverse Event Prediction at Hospital Discharge
Figure 3 for Blending Knowledge in Deep Recurrent Networks for Adverse Event Prediction at Hospital Discharge
Figure 4 for Blending Knowledge in Deep Recurrent Networks for Adverse Event Prediction at Hospital Discharge

Deep learning architectures have an extremely high-capacity for modeling complex data in a wide variety of domains. However, these architectures have been limited in their ability to support complex prediction problems using insurance claims data, such as readmission at 30 days, mainly due to data sparsity issue. Consequently, classical machine learning methods, especially those that embed domain knowledge in handcrafted features, are often on par with, and sometimes outperform, deep learning approaches. In this paper, we illustrate how the potential of deep learning can be achieved by blending domain knowledge within deep learning architectures to predict adverse events at hospital discharge, including readmissions. More specifically, we introduce a learning architecture that fuses a representation of patient data computed by a self-attention based recurrent neural network, with clinically relevant features. We conduct extensive experiments on a large claims dataset and show that the blended method outperforms the standard machine learning approaches.

* Presented at the AMIA 2021 Virtual Informatics Summit 
Viaarxiv icon

Question-Driven Design Process for Explainable AI User Experiences

Apr 08, 2021
Q. Vera Liao, Milena Pribić, Jaesik Han, Sarah Miller, Daby Sow

Figure 1 for Question-Driven Design Process for Explainable AI User Experiences
Figure 2 for Question-Driven Design Process for Explainable AI User Experiences
Figure 3 for Question-Driven Design Process for Explainable AI User Experiences
Figure 4 for Question-Driven Design Process for Explainable AI User Experiences

A pervasive design issue of AI systems is their explainability--how to provide appropriate information to help users understand the AI. The technical field of explainable AI (XAI) has produced a rich toolbox of techniques. Designers are now tasked with the challenges of how to select the most suitable XAI techniques and translate them into UX solutions. Informed by our previous work studying design challenges around XAI UX, this work proposes a design process to tackle these challenges. We review our and related prior work to identify requirements that the process should fulfill, and accordingly, propose a Question-Driven Design Process that grounds the user needs, choices of XAI techniques, design, and evaluation of XAI UX all in the user questions. We provide a mapping guide between prototypical user questions and exemplars of XAI techniques, serving as boundary objects to support collaboration between designers and AI engineers. We demonstrate it with a use case of designing XAI for healthcare adverse events prediction, and discuss lessons learned for tackling design challenges of AI systems.

* working paper 
Viaarxiv icon

Phenotypical Ontology Driven Framework for Multi-Task Learning

Sep 04, 2020
Mohamed Ghalwash, Zijun Yao, Prithwish Chakraborty, James Codella, Daby Sow

Figure 1 for Phenotypical Ontology Driven Framework for Multi-Task Learning
Figure 2 for Phenotypical Ontology Driven Framework for Multi-Task Learning
Figure 3 for Phenotypical Ontology Driven Framework for Multi-Task Learning
Figure 4 for Phenotypical Ontology Driven Framework for Multi-Task Learning

Despite the large number of patients in Electronic Health Records (EHRs), the subset of usable data for modeling outcomes of specific phenotypes are often imbalanced and of modest size. This can be attributed to the uneven coverage of medical concepts in EHRs. In this paper, we propose OMTL, an Ontology-driven Multi-Task Learning framework, that is designed to overcome such data limitations. The key contribution of our work is the effective use of knowledge from a predefined well-established medical relationship graph (ontology) to construct a novel deep learning network architecture that mirrors this ontology. It can effectively leverage knowledge from a well-established medical relationship graph (ontology) by constructing a deep learning network architecture that mirrors this graph. This enables common representations to be shared across related phenotypes, and was found to improve the learning performance. The proposed OMTL naturally allows for multitask learning of different phenotypes on distinct predictive tasks. These phenotypes are tied together by their semantic distance according to the external medical ontology. Using the publicly available MIMIC-III database, we evaluate OMTL and demonstrate its efficacy on several real patient outcome predictions over state-of-the-art multi-task learning schemes.

Viaarxiv icon

A Canonical Architecture For Predictive Analytics on Longitudinal Patient Records

Jul 24, 2020
Parthasarathy Suryanarayanan, Bhavani Iyer, Prithwish Chakraborty, Bibo Hao, Italo Buleje, Piyush Madan, James Codella, Antonio Foncubierta, Divya Pathak, Sarah Miller, Amol Rajmane, Shannon Harrer, Gigi Yuan-Reed, Daby Sow

Figure 1 for A Canonical Architecture For Predictive Analytics on Longitudinal Patient Records

Many institutions within the healthcare ecosystem are making significant investments in AI technologies to optimize their business operations at lower cost with improved patient outcomes. Despite the hype with AI, the full realization of this potential is seriously hindered by several systemic problems, including data privacy, security, bias, fairness, and explainability. In this paper, we propose a novel canonical architecture for the development of AI models in healthcare that addresses these challenges. This system enables the creation and management of AI predictive models throughout all the phases of their life cycle, including data ingestion, model building, and model promotion in production environments. This paper describes this architecture in detail, along with a qualitative evaluation of our experience of using it on real world problems.

Viaarxiv icon

ODVICE: An Ontology-Driven Visual Analytic Tool for Interactive Cohort Extraction

May 13, 2020
Mohamed Ghalwash, Zijun Yao, Prithwish Chakrabotry, James Codella, Daby Sow

Figure 1 for ODVICE: An Ontology-Driven Visual Analytic Tool for Interactive Cohort Extraction
Figure 2 for ODVICE: An Ontology-Driven Visual Analytic Tool for Interactive Cohort Extraction

Increased availability of electronic health records (EHR) has enabled researchers to study various medical questions. Cohort selection for the hypothesis under investigation is one of the main consideration for EHR analysis. For uncommon diseases, cohorts extracted from EHRs contain very limited number of records - hampering the robustness of any analysis. Data augmentation methods have been successfully applied in other domains to address this issue mainly using simulated records. In this paper, we present ODVICE, a data augmentation framework that leverages the medical concept ontology to systematically augment records using a novel ontologically guided Monte-Carlo graph spanning algorithm. The tool allows end users to specify a small set of interactive controls to control the augmentation process. We analyze the importance of ODVICE by conducting studies on MIMIC-III dataset for two learning tasks. Our results demonstrate the predictive performance of ODVICE augmented cohorts, showing ~30% improvement in area under the curve (AUC) over the non-augmented dataset and other data augmentation strategies.

Viaarxiv icon

Is Deep Reinforcement Learning Ready for Practical Applications in Healthcare? A Sensitivity Analysis of Duel-DDQN for Sepsis Treatment

May 08, 2020
MingYu Lu, Zachary Shahn, Daby Sow, Finale Doshi-Velez, Li-wei H. Lehman

Figure 1 for Is Deep Reinforcement Learning Ready for Practical Applications in Healthcare? A Sensitivity Analysis of Duel-DDQN for Sepsis Treatment
Figure 2 for Is Deep Reinforcement Learning Ready for Practical Applications in Healthcare? A Sensitivity Analysis of Duel-DDQN for Sepsis Treatment
Figure 3 for Is Deep Reinforcement Learning Ready for Practical Applications in Healthcare? A Sensitivity Analysis of Duel-DDQN for Sepsis Treatment
Figure 4 for Is Deep Reinforcement Learning Ready for Practical Applications in Healthcare? A Sensitivity Analysis of Duel-DDQN for Sepsis Treatment

The potential of Reinforcement Learning (RL) has been demonstrated through successful applications to games such as Go and Atari. However, while it is straightforward to evaluate the performance of an RL algorithm in a game setting by simply using it to play the game, evaluation is a major challenge in clinical settings where it could be unsafe to follow RL policies in practice. Thus, understanding sensitivity of RL policies to the host of decisions made during implementation is an important step toward building the type of trust in RL required for eventual clinical uptake. In this work, we perform a sensitivity analysis on a state-of-the-art RL algorithm (Dueling Double Deep Q-Networks)applied to hemodynamic stabilization treatment strategies for septic patients in the ICU. We consider sensitivity of learned policies to input features, time discretization, reward function, and random seeds. We find that varying these settings can significantly impact learned policies, which suggests a need for caution when interpreting RL agent output.

* 10 pages, 9 figures 
Viaarxiv icon

G-Net: A Deep Learning Approach to G-computation for Counterfactual Outcome Prediction Under Dynamic Treatment Regimes

Mar 23, 2020
Rui Li, Zach Shahn, Jun Li, Mingyu Lu, Prithwish Chakraborty, Daby Sow, Mohamed Ghalwash, Li-wei H. Lehman

Figure 1 for G-Net: A Deep Learning Approach to G-computation for Counterfactual Outcome Prediction Under Dynamic Treatment Regimes
Figure 2 for G-Net: A Deep Learning Approach to G-computation for Counterfactual Outcome Prediction Under Dynamic Treatment Regimes
Figure 3 for G-Net: A Deep Learning Approach to G-computation for Counterfactual Outcome Prediction Under Dynamic Treatment Regimes
Figure 4 for G-Net: A Deep Learning Approach to G-computation for Counterfactual Outcome Prediction Under Dynamic Treatment Regimes

Counterfactual prediction is a fundamental task in decision-making. G-computation is a method for estimating expected counterfactual outcomes under dynamic time-varying treatment strategies. Existing G-computation implementations have mostly employed classical regression models with limited capacity to capture complex temporal and nonlinear dependence structures. This paper introduces G-Net, a novel sequential deep learning framework for G-computation that can handle complex time series data while imposing minimal modeling assumptions and provide estimates of individual or population-level time varying treatment effects. We evaluate alternative G-Net implementations using realistically complex temporal simulated data obtained from CVSim, a mechanistic model of the cardiovascular system.

Viaarxiv icon