Alert button
Picture for Jorge Cardoso

Jorge Cardoso

Alert button

Privacy Distillation: Reducing Re-identification Risk of Multimodal Diffusion Models

Jun 02, 2023
Virginia Fernandez, Pedro Sanchez, Walter Hugo Lopez Pinaya, Grzegorz Jacenków, Sotirios A. Tsaftaris, Jorge Cardoso

Figure 1 for Privacy Distillation: Reducing Re-identification Risk of Multimodal Diffusion Models
Figure 2 for Privacy Distillation: Reducing Re-identification Risk of Multimodal Diffusion Models
Figure 3 for Privacy Distillation: Reducing Re-identification Risk of Multimodal Diffusion Models
Figure 4 for Privacy Distillation: Reducing Re-identification Risk of Multimodal Diffusion Models

Knowledge distillation in neural networks refers to compressing a large model or dataset into a smaller version of itself. We introduce Privacy Distillation, a framework that allows a text-to-image generative model to teach another model without exposing it to identifiable data. Here, we are interested in the privacy issue faced by a data provider who wishes to share their data via a multimodal generative model. A question that immediately arises is ``How can a data provider ensure that the generative model is not leaking identifiable information about a patient?''. Our solution consists of (1) training a first diffusion model on real data (2) generating a synthetic dataset using this model and filtering it to exclude images with a re-identifiability risk (3) training a second diffusion model on the filtered synthetic data only. We showcase that datasets sampled from models trained with privacy distillation can effectively reduce re-identification risk whilst maintaining downstream performance.

Viaarxiv icon

Morphology-preserving Autoregressive 3D Generative Modelling of the Brain

Sep 07, 2022
Petru-Daniel Tudosiu, Walter Hugo Lopez Pinaya, Mark S. Graham, Pedro Borges, Virginia Fernandez, Dai Yang, Jeremy Appleyard, Guido Novati, Disha Mehra, Mike Vella, Parashkev Nachev, Sebastien Ourselin, Jorge Cardoso

Figure 1 for Morphology-preserving Autoregressive 3D Generative Modelling of the Brain
Figure 2 for Morphology-preserving Autoregressive 3D Generative Modelling of the Brain
Figure 3 for Morphology-preserving Autoregressive 3D Generative Modelling of the Brain
Figure 4 for Morphology-preserving Autoregressive 3D Generative Modelling of the Brain

Human anatomy, morphology, and associated diseases can be studied using medical imaging data. However, access to medical imaging data is restricted by governance and privacy concerns, data ownership, and the cost of acquisition, thus limiting our ability to understand the human body. A possible solution to this issue is the creation of a model able to learn and then generate synthetic images of the human body conditioned on specific characteristics of relevance (e.g., age, sex, and disease status). Deep generative models, in the form of neural networks, have been recently used to create synthetic 2D images of natural scenes. Still, the ability to produce high-resolution 3D volumetric imaging data with correct anatomical morphology has been hampered by data scarcity and algorithmic and computational limitations. This work proposes a generative model that can be scaled to produce anatomically correct, high-resolution, and realistic images of the human brain, with the necessary quality to allow further downstream analyses. The ability to generate a potentially unlimited amount of data not only enables large-scale studies of human anatomy and pathology without jeopardizing patient privacy, but also significantly advances research in the field of anomaly detection, modality synthesis, learning under limited data, and fair and ethical AI. Code and trained models are available at: https://github.com/AmigoLab/SynthAnatomy.

* 13 pages, 3 figures, 2 tables, accepted at SASHIMI MICCAI 2022 
Viaarxiv icon

Leveraging Log Instructions in Log-based Anomaly Detection

Jul 07, 2022
Jasmin Bogatinovski, Gjorgji Madjarov, Sasho Nedelkoski, Jorge Cardoso, Odej Kao

Figure 1 for Leveraging Log Instructions in Log-based Anomaly Detection
Figure 2 for Leveraging Log Instructions in Log-based Anomaly Detection
Figure 3 for Leveraging Log Instructions in Log-based Anomaly Detection
Figure 4 for Leveraging Log Instructions in Log-based Anomaly Detection

Artificial Intelligence for IT Operations (AIOps) describes the process of maintaining and operating large IT systems using diverse AI-enabled methods and tools for, e.g., anomaly detection and root cause analysis, to support the remediation, optimization, and automatic initiation of self-stabilizing IT activities. The core step of any AIOps workflow is anomaly detection, typically performed on high-volume heterogeneous data such as log messages (logs), metrics (e.g., CPU utilization), and distributed traces. In this paper, we propose a method for reliable and practical anomaly detection from system logs. It overcomes the common disadvantage of related works, i.e., the need for a large amount of manually labeled training data, by building an anomaly detection model with log instructions from the source code of 1000+ GitHub projects. The instructions from diverse systems contain rich and heterogenous information about many different normal and abnormal IT events and serve as a foundation for anomaly detection. The proposed method, named ADLILog, combines the log instructions and the data from the system of interest (target system) to learn a deep neural network model through a two-phase learning procedure. The experimental results show that ADLILog outperforms the related approaches by up to 60% on the F1 score while satisfying core non-functional requirements for industrial deployments such as unsupervised design, efficient model updates, and small model sizes.

* This paper has been accepted for publication in IEEE Service Computing Conference, 2022, Barcelona 
Viaarxiv icon

Failure Identification from Unstable Log Data using Deep Learning

Apr 06, 2022
Jasmin Bogatinovski, Sasho Nedelkoski, Li Wu, Jorge Cardoso, Odej Kao

Figure 1 for Failure Identification from Unstable Log Data using Deep Learning
Figure 2 for Failure Identification from Unstable Log Data using Deep Learning
Figure 3 for Failure Identification from Unstable Log Data using Deep Learning
Figure 4 for Failure Identification from Unstable Log Data using Deep Learning

The reliability of cloud platforms is of significant relevance because society increasingly relies on complex software systems running on the cloud. To improve it, cloud providers are automating various maintenance tasks, with failure identification frequently being considered. The precondition for automation is the availability of observability tools, with system logs commonly being used. The focus of this paper is log-based failure identification. This problem is challenging because of the instability of the log data and the incompleteness of the explicit logging failure coverage within the code. To address the two challenges, we present CLog as a method for failure identification. The key idea presented herein based is on our observation that by representing the log data as sequences of subprocesses instead of sequences of log events, the effect of the unstable log data is reduced. CLog introduces a novel subprocess extraction method that uses context-aware neural network and clustering methods to extract meaningful subprocesses. The direct modeling of log event contexts allows the identification of failures with respect to the abrupt context changes, addressing the challenge of insufficient logging failure coverage. Our experimental results demonstrate that the learned subprocesses representations reduce the instability in the input, allowing CLog to outperform the baselines on the failure identification subproblems - 1) failure detection by 9-24% on F1 score and 2) failure type identification by 7% on the macro averaged F1 score. Further analysis shows the existent negative correlation between the instability in the input event sequences and the detection performance in a model-agnostic manner.

* This paper is accepted for publication at IEEE CCGrid 2022. For fairest citation, please use the original proceedings credentials 
Viaarxiv icon

Data-Driven Approach for Log Instruction Quality Assessment

Apr 06, 2022
Jasmin Bogatinovski, Sasho Nedelkoski, Alexander Acker, Jorge Cardoso, Odej Kao

Figure 1 for Data-Driven Approach for Log Instruction Quality Assessment
Figure 2 for Data-Driven Approach for Log Instruction Quality Assessment
Figure 3 for Data-Driven Approach for Log Instruction Quality Assessment
Figure 4 for Data-Driven Approach for Log Instruction Quality Assessment

In the current IT world, developers write code while system operators run the code mostly as a black box. The connection between both worlds is typically established with log messages: the developer provides hints to the (unknown) operator, where the cause of an occurred issue is, and vice versa, the operator can report bugs during operation. To fulfil this purpose, developers write log instructions that are structured text commonly composed of a log level (e.g., "info", "error"), static text ("IP {} cannot be reached"), and dynamic variables (e.g. IP {}). However, as opposed to well-adopted coding practices, there are no widely adopted guidelines on how to write log instructions with good quality properties. For example, a developer may assign a high log level (e.g., "error") for a trivial event that can confuse the operator and increase maintenance costs. Or the static text can be insufficient to hint at a specific issue. In this paper, we address the problem of log quality assessment and provide the first step towards its automation. We start with an in-depth analysis of quality log instruction properties in nine software systems and identify two quality properties: 1) correct log level assignment assessing the correctness of the log level, and 2) sufficient linguistic structure assessing the minimal richness of the static text necessary for verbose event description. Based on these findings, we developed a data-driven approach that adapts deep learning methods for each of the two properties. An extensive evaluation on large-scale open-source systems shows that our approach correctly assesses log level assignments with an accuracy of 0.88, and the sufficient linguistic structure with an F1 score of 0.99, outperforming the baselines. Our study shows the potential of the data-driven methods in assessing instructions quality and aid developers in comprehending and writing better code.

* This paper is accepted for publication at the 30th International Conference on Program Comprehension under doi: 10.1145/3524610.3527906. The copyrights are handled following the corresponding agreement between the author and publisher 
Viaarxiv icon

CrossMoDA 2021 challenge: Benchmark of Cross-Modality Domain Adaptation techniques for Vestibular Schwnannoma and Cochlea Segmentation

Jan 08, 2022
Reuben Dorent, Aaron Kujawa, Marina Ivory, Spyridon Bakas, Nicola Rieke, Samuel Joutard, Ben Glocker, Jorge Cardoso, Marc Modat, Kayhan Batmanghelich, Arseniy Belkov, Maria Baldeon Calisto, Jae Won Choi, Benoit M. Dawant, Hexin Dong, Sergio Escalera, Yubo Fan, Lasse Hansen, Mattias P. Heinrich, Smriti Joshi, Victoriya Kashtanova, Hyeon Gyu Kim, Satoshi Kondo, Christian N. Kruse, Susana K. Lai-Yuen, Hao Li, Han Liu, Buntheng Ly, Ipek Oguz, Hyungseob Shin, Boris Shirokikh, Zixian Su, Guotai Wang, Jianghao Wu, Yanwu Xu, Kai Yao, Li Zhang, Sebastien Ourselin, Jonathan Shapey, Tom Vercauteren

Figure 1 for CrossMoDA 2021 challenge: Benchmark of Cross-Modality Domain Adaptation techniques for Vestibular Schwnannoma and Cochlea Segmentation
Figure 2 for CrossMoDA 2021 challenge: Benchmark of Cross-Modality Domain Adaptation techniques for Vestibular Schwnannoma and Cochlea Segmentation
Figure 3 for CrossMoDA 2021 challenge: Benchmark of Cross-Modality Domain Adaptation techniques for Vestibular Schwnannoma and Cochlea Segmentation
Figure 4 for CrossMoDA 2021 challenge: Benchmark of Cross-Modality Domain Adaptation techniques for Vestibular Schwnannoma and Cochlea Segmentation

Domain Adaptation (DA) has recently raised strong interests in the medical imaging community. While a large variety of DA techniques has been proposed for image segmentation, most of these techniques have been validated either on private datasets or on small publicly available datasets. Moreover, these datasets mostly addressed single-class problems. To tackle these limitations, the Cross-Modality Domain Adaptation (crossMoDA) challenge was organised in conjunction with the 24th International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI 2021). CrossMoDA is the first large and multi-class benchmark for unsupervised cross-modality DA. The challenge's goal is to segment two key brain structures involved in the follow-up and treatment planning of vestibular schwannoma (VS): the VS and the cochleas. Currently, the diagnosis and surveillance in patients with VS are performed using contrast-enhanced T1 (ceT1) MRI. However, there is growing interest in using non-contrast sequences such as high-resolution T2 (hrT2) MRI. Therefore, we created an unsupervised cross-modality segmentation benchmark. The training set provides annotated ceT1 (N=105) and unpaired non-annotated hrT2 (N=105). The aim was to automatically perform unilateral VS and bilateral cochlea segmentation on hrT2 as provided in the testing set (N=137). A total of 16 teams submitted their algorithm for the evaluation phase. The level of performance reached by the top-performing teams is strikingly high (best median Dice - VS:88.4%; Cochleas:85.7%) and close to full supervision (median Dice - VS:92.5%; Cochleas:87.7%). All top-performing methods made use of an image-to-image translation approach to transform the source-domain images into pseudo-target-domain images. A segmentation network was then trained using these generated images and the manual annotations provided for the source image.

* Submitted to Medical Image Analysis 
Viaarxiv icon

Equitable modelling of brain imaging by counterfactual augmentation with morphologically constrained 3D deep generative models

Nov 29, 2021
Guilherme Pombo, Robert Gray, Jorge Cardoso, Sebastien Ourselin, Geraint Rees, John Ashburner, Parashkev Nachev

Figure 1 for Equitable modelling of brain imaging by counterfactual augmentation with morphologically constrained 3D deep generative models
Figure 2 for Equitable modelling of brain imaging by counterfactual augmentation with morphologically constrained 3D deep generative models
Figure 3 for Equitable modelling of brain imaging by counterfactual augmentation with morphologically constrained 3D deep generative models
Figure 4 for Equitable modelling of brain imaging by counterfactual augmentation with morphologically constrained 3D deep generative models

We describe Countersynth, a conditional generative model of diffeomorphic deformations that induce label-driven, biologically plausible changes in volumetric brain images. The model is intended to synthesise counterfactual training data augmentations for downstream discriminative modelling tasks where fidelity is limited by data imbalance, distributional instability, confounding, or underspecification, and exhibits inequitable performance across distinct subpopulations. Focusing on demographic attributes, we evaluate the quality of synthesized counterfactuals with voxel-based morphometry, classification and regression of the conditioning attributes, and the Fr\'{e}chet inception distance. Examining downstream discriminative performance in the context of engineered demographic imbalance and confounding, we use UK Biobank magnetic resonance imaging data to benchmark CounterSynth augmentation against current solutions to these problems. We achieve state-of-the-art improvements, both in overall fidelity and equity. The source code for CounterSynth is available online.

Viaarxiv icon

IAD: Indirect Anomalous VMMs Detection in the Cloud-based Environment

Nov 22, 2021
Anshul Jindal, Ilya Shakhat, Jorge Cardoso, Michael Gerndt, Vladimir Podolskiy

Figure 1 for IAD: Indirect Anomalous VMMs Detection in the Cloud-based Environment
Figure 2 for IAD: Indirect Anomalous VMMs Detection in the Cloud-based Environment
Figure 3 for IAD: Indirect Anomalous VMMs Detection in the Cloud-based Environment
Figure 4 for IAD: Indirect Anomalous VMMs Detection in the Cloud-based Environment

Server virtualization in the form of virtual machines (VMs) with the use of a hypervisor or a Virtual Machine Monitor (VMM) is an essential part of cloud computing technology to provide infrastructure-as-a-service (IaaS). A fault or an anomaly in the VMM can propagate to the VMs hosted on it and ultimately affect the availability and reliability of the applications running on those VMs. Therefore, identifying and eventually resolving it quickly is highly important. However, anomalous VMM detection is a challenge in the cloud environment since the user does not have access to the VMM. This paper addresses this challenge of anomalous VMM detection in the cloud-based environment without having any knowledge or data from VMM by introducing a novel machine learning-based algorithm called IAD: Indirect Anomalous VMMs Detection. This algorithm solely uses the VM's resources utilization data hosted on those VMMs for the anomalous VMMs detection. The developed algorithm's accuracy was tested on four datasets comprising the synthetic and real and compared against four other popular algorithms, which can also be used to the described problem. It was found that the proposed IAD algorithm has an average F1-score of 83.7% averaged across four datasets, and also outperforms other algorithms by an average F1-score of 11\%.

* Accepted at AIOps 2021 workshop (ICSOC 2021) 
Viaarxiv icon

Online Memory Leak Detection in the Cloud-based Infrastructures

Jan 24, 2021
Anshul Jindal, Paul Staab, Jorge Cardoso, Michael Gerndt, Vladimir Podolskiy

Figure 1 for Online Memory Leak Detection in the Cloud-based Infrastructures
Figure 2 for Online Memory Leak Detection in the Cloud-based Infrastructures
Figure 3 for Online Memory Leak Detection in the Cloud-based Infrastructures
Figure 4 for Online Memory Leak Detection in the Cloud-based Infrastructures

A memory leak in an application deployed on the cloud can affect the availability and reliability of the application. Therefore, to identify and ultimately resolve it quickly is highly important. However, in the production environment running on the cloud, memory leak detection is a challenge without the knowledge of the application or its internal object allocation details. This paper addresses this challenge of online detection of memory leaks in cloud-based infrastructure without having any internal application knowledge by introducing a novel machine learning based algorithm Precog. This algorithm solely uses one metric i.e the system's memory utilization on which the application is deployed for the detection of a memory leak. The developed algorithm's accuracy was tested on 60 virtual machines manually labeled memory utilization data provided by our industry partner Huawei Munich Research Center and it was found that the proposed algorithm achieves the accuracy score of 85\% with less than half a second prediction time per virtual machine.

* International Workshop on Artificial Intelligence for IT Operations (AIOPS) 2020  
* 12 pages 
Viaarxiv icon

Artificial Intelligence for IT Operations (AIOPS) Workshop White Paper

Jan 15, 2021
Jasmin Bogatinovski, Sasho Nedelkoski, Alexander Acker, Florian Schmidt, Thorsten Wittkopp, Soeren Becker, Jorge Cardoso, Odej Kao

Figure 1 for Artificial Intelligence for IT Operations (AIOPS) Workshop White Paper
Figure 2 for Artificial Intelligence for IT Operations (AIOPS) Workshop White Paper

Artificial Intelligence for IT Operations (AIOps) is an emerging interdisciplinary field arising in the intersection between the research areas of machine learning, big data, streaming analytics, and the management of IT operations. AIOps, as a field, is a candidate to produce the future standard for IT operation management. To that end, AIOps has several challenges. First, it needs to combine separate research branches from other research fields like software reliability engineering. Second, novel modelling techniques are needed to understand the dynamics of different systems. Furthermore, it requires to lay out the basis for assessing: time horizons and uncertainty for imminent SLA violations, the early detection of emerging problems, autonomous remediation, decision making, support of various optimization objectives. Moreover, a good understanding and interpretability of these aiding models are important for building trust between the employed tools and the domain experts. Finally, all this will result in faster adoption of AIOps, further increase the interest in this research field and contribute to bridging the gap towards fully-autonomous operating IT systems. The main aim of the AIOPS workshop is to bring together researchers from both academia and industry to present their experiences, results, and work in progress in this field. The workshop aims to strengthen the community and unite it towards the goal of joining the efforts for solving the main challenges the field is currently facing. A consensus and adoption of the principles of openness and reproducibility will boost the research in this emerging area significantly.

* 8 pages, white paper for the AIOPS 2020 workshop at ICSOC 2020 
Viaarxiv icon