Alert button
Picture for Ricardo Moreira

Ricardo Moreira

Alert button

Lightweight Automated Feature Monitoring for Data Streams

Jul 19, 2022
João Conde, Ricardo Moreira, João Torres, Pedro Cardoso, Hugo R. C. Ferreira, Marco O. P. Sampaio, João Tiago Ascensão, Pedro Bizarro

Figure 1 for Lightweight Automated Feature Monitoring for Data Streams
Figure 2 for Lightweight Automated Feature Monitoring for Data Streams
Figure 3 for Lightweight Automated Feature Monitoring for Data Streams
Figure 4 for Lightweight Automated Feature Monitoring for Data Streams

Monitoring the behavior of automated real-time stream processing systems has become one of the most relevant problems in real world applications. Such systems have grown in complexity relying heavily on high dimensional input data, and data hungry Machine Learning (ML) algorithms. We propose a flexible system, Feature Monitoring (FM), that detects data drifts in such data sets, with a small and constant memory footprint and a small computational cost in streaming applications. The method is based on a multi-variate statistical test and is data driven by design (full reference distributions are estimated from the data). It monitors all features that are used by the system, while providing an interpretable features ranking whenever an alarm occurs (to aid in root cause analysis). The computational and memory lightness of the system results from the use of Exponential Moving Histograms. In our experimental study, we analyze the system's behavior with its parameters and, more importantly, show examples where it detects problems that are not directly related to a single feature. This illustrates how FM eliminates the need to add custom signals to detect specific types of problems and that monitoring the available space of features is often enough.

* 10 pages, 5 figures. AutoML, KDD22, August 14-17, 2022, Washington, DC, US 
Viaarxiv icon

ConceptDistil: Model-Agnostic Distillation of Concept Explanations

May 07, 2022
João Bento Sousa, Ricardo Moreira, Vladimir Balayan, Pedro Saleiro, Pedro Bizarro

Figure 1 for ConceptDistil: Model-Agnostic Distillation of Concept Explanations
Figure 2 for ConceptDistil: Model-Agnostic Distillation of Concept Explanations
Figure 3 for ConceptDistil: Model-Agnostic Distillation of Concept Explanations
Figure 4 for ConceptDistil: Model-Agnostic Distillation of Concept Explanations

Concept-based explanations aims to fill the model interpretability gap for non-technical humans-in-the-loop. Previous work has focused on providing concepts for specific models (eg, neural networks) or data types (eg, images), and by either trying to extract concepts from an already trained network or training self-explainable models through multi-task learning. In this work, we propose ConceptDistil, a method to bring concept explanations to any black-box classifier using knowledge distillation. ConceptDistil is decomposed into two components:(1) a concept model that predicts which domain concepts are present in a given instance, and (2) a distillation model that tries to mimic the predictions of a black-box model using the concept model predictions. We validate ConceptDistil in a real world use-case, showing that it is able to optimize both tasks, bringing concept-explainability to any black-box model.

* ICLR 2022 PAIR2Struct Workshop 
Viaarxiv icon

Data+Shift: Supporting visual investigation of data distribution shifts by data scientists

Apr 29, 2022
João Palmeiro, Beatriz Malveiro, Rita Costa, David Polido, Ricardo Moreira, Pedro Bizarro

Figure 1 for Data+Shift: Supporting visual investigation of data distribution shifts by data scientists
Figure 2 for Data+Shift: Supporting visual investigation of data distribution shifts by data scientists

Machine learning on data streams is increasingly more present in multiple domains. However, there is often data distribution shift that can lead machine learning models to make incorrect decisions. While there are automatic methods to detect when drift is happening, human analysis, often by data scientists, is essential to diagnose the causes of the problem and adjust the system. We propose Data+Shift, a visual analytics tool to support data scientists in the task of investigating the underlying factors of shift in data features in the context of fraud detection. Design requirements were derived from interviews with data scientists. Data+Shift is integrated with JupyterLab and can be used alongside other data science tools. We validated our approach with a think-aloud experiment where a data scientist used the tool for a fraud detection use case.

* 5 pages, 3 figures, short paper accepted at EuroVis 2022 
Viaarxiv icon