Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Albert Bifet

albert.bifet@telecom-paristech.fr

Online Isolation Forest

May 14, 2025

Filippo Leveni, Guilherme Weigert Cassales, Bernhard Pfahringer, Albert Bifet, Giacomo Boracchi

Abstract:The anomaly detection literature is abundant with offline methods, which require repeated access to data in memory, and impose impractical assumptions when applied to a streaming context. Existing online anomaly detection methods also generally fail to address these constraints, resorting to periodic retraining to adapt to the online context. We propose Online-iForest, a novel method explicitly designed for streaming conditions that seamlessly tracks the data generating process as it evolves over time. Experimental validation on real-world datasets demonstrated that Online-iForest is on par with online alternatives and closely rivals state-of-the-art offline anomaly detection techniques that undergo periodic retraining. Notably, Online-iForest consistently outperforms all competitors in terms of efficiency, making it a promising solution in applications where fast identification of anomalies is of primary importance such as cybersecurity, fraud and fault detection.

* Accepted at International Conference on Machine Learning (ICML 2024)

Via

Access Paper or Ask Questions

CapyMOA: Efficient Machine Learning for Data Streams in Python

Feb 11, 2025

Heitor Murilo Gomes, Anton Lee, Nuwan Gunasekara, Yibin Sun, Guilherme Weigert Cassales, Justin Liu, Marco Heyden, Vitor Cerqueira, Maroua Bahri, Yun Sing Koh(+2 more)

Abstract:CapyMOA is an open-source library designed for efficient machine learning on streaming data. It provides a structured framework for real-time learning and evaluation, featuring a flexible data representation. CapyMOA includes an extensible architecture that allows integration with external frameworks such as MOA and PyTorch, facilitating hybrid learning approaches that combine traditional online algorithms with deep learning techniques. By emphasizing adaptability, scalability, and usability, CapyMOA allows researchers and practitioners to tackle dynamic learning challenges across various domains.

Via

Access Paper or Ask Questions

Evaluation for Regression Analyses on Evolving Data Streams

Feb 11, 2025

Yibin Sun, Heitor Murilo Gomes, Bernhard Pfahringer, Albert Bifet

Figure 1 for Evaluation for Regression Analyses on Evolving Data Streams

Figure 2 for Evaluation for Regression Analyses on Evolving Data Streams

Figure 3 for Evaluation for Regression Analyses on Evolving Data Streams

Figure 4 for Evaluation for Regression Analyses on Evolving Data Streams

Abstract:The paper explores the challenges of regression analysis in evolving data streams, an area that remains relatively underexplored compared to classification. We propose a standardized evaluation process for regression and prediction interval tasks in streaming contexts. Additionally, we introduce an innovative drift simulation strategy capable of synthesizing various drift types, including the less-studied incremental drift. Comprehensive experiments with state-of-the-art methods, conducted under the proposed process, validate the effectiveness and robustness of our approach.

* 11 Pages, 9 figures

Via

Access Paper or Ask Questions

Optimizing Hyperparameters for Quantum Data Re-Uploaders in Calorimetric Particle Identification

Dec 16, 2024

Léa Cassé, Bernhard Pfahringer, Albert Bifet, Frédéric Magniette

Figure 1 for Optimizing Hyperparameters for Quantum Data Re-Uploaders in Calorimetric Particle Identification

Figure 2 for Optimizing Hyperparameters for Quantum Data Re-Uploaders in Calorimetric Particle Identification

Figure 3 for Optimizing Hyperparameters for Quantum Data Re-Uploaders in Calorimetric Particle Identification

Figure 4 for Optimizing Hyperparameters for Quantum Data Re-Uploaders in Calorimetric Particle Identification

Abstract:We present an application of a single-qubit Data Re-Uploading (QRU) quantum model for particle classification in calorimetric experiments. Optimized for Noisy Intermediate-Scale Quantum (NISQ) devices, this model requires minimal qubits while delivering strong classification performance. Evaluated on a novel simulated dataset specific to particle physics, the QRU model achieves high accuracy in classifying particle types. Through a systematic exploration of model hyperparameters -- such as circuit depth, rotation gates, input normalization and the number of trainable parameters per input -- and training parameters like batch size, optimizer, loss function and learning rate, we assess their individual impacts on model accuracy and efficiency. Additionally, we apply global optimization methods, uncovering hyperparameter correlations that further enhance performance. Our results indicate that the QRU model attains significant accuracy with efficient computational costs, underscoring its potential for practical quantum machine learning applications.

* 17 pages, 22 figures

Via

Access Paper or Ask Questions

Real-Time Energy Pricing in New Zealand: An Evolving Stream Analysis

Aug 29, 2024

Yibin Sun, Heitor Murilo Gomes, Bernhard Pfahringer, Albert Bifet

Figure 1 for Real-Time Energy Pricing in New Zealand: An Evolving Stream Analysis

Figure 2 for Real-Time Energy Pricing in New Zealand: An Evolving Stream Analysis

Figure 3 for Real-Time Energy Pricing in New Zealand: An Evolving Stream Analysis

Figure 4 for Real-Time Energy Pricing in New Zealand: An Evolving Stream Analysis

Abstract:This paper introduces a group of novel datasets representing real-time time-series and streaming data of energy prices in New Zealand, sourced from the Electricity Market Information (EMI) website maintained by the New Zealand government. The datasets are intended to address the scarcity of proper datasets for streaming regression learning tasks. We conduct extensive analyses and experiments on these datasets, covering preprocessing techniques, regression tasks, prediction intervals, concept drift detection, and anomaly detection. Our experiments demonstrate the datasets' utility and highlight the challenges and opportunities for future research in energy price forecasting.

* 12 Pages, 8 figures, short version accepted by PRICAI

Via

Access Paper or Ask Questions

A Probabilistic Framework for Adapting to Changing and Recurring Concepts in Data Streams

Aug 18, 2024

Ben Halstead, Yun Sing Koh, Patricia Riddle, Mykola Pechenizkiy, Albert Bifet

Figure 1 for A Probabilistic Framework for Adapting to Changing and Recurring Concepts in Data Streams

Figure 2 for A Probabilistic Framework for Adapting to Changing and Recurring Concepts in Data Streams

Figure 3 for A Probabilistic Framework for Adapting to Changing and Recurring Concepts in Data Streams

Figure 4 for A Probabilistic Framework for Adapting to Changing and Recurring Concepts in Data Streams

Abstract:The distribution of streaming data often changes over time as conditions change, a phenomenon known as concept drift. Only a subset of previous experience, collected in similar conditions, is relevant to learning an accurate classifier for current data. Learning from irrelevant experience describing a different concept can degrade performance. A system learning from streaming data must identify which recent experience is irrelevant when conditions change and which past experience is relevant when concepts reoccur, \textit{e.g.,} when weather events or financial patterns repeat. Existing streaming approaches either do not consider experience to change in relevance over time and thus cannot handle concept drift, or only consider the recency of experience and thus cannot handle recurring concepts, or only sparsely evaluate relevance and thus fail when concept drift is missed. To enable learning in changing conditions, we propose SELeCT, a probabilistic method for continuously evaluating the relevance of past experience. SELeCT maintains a distinct internal state for each concept, representing relevant experience with a unique classifier. We propose a Bayesian algorithm for estimating state relevance, combining the likelihood of drawing recent observations from a given state with a transition pattern prior based on the system's current state.

Via

Access Paper or Ask Questions

Branches: A Fast Dynamic Programming and Branch & Bound Algorithm for Optimal Decision Trees

Jun 04, 2024

Ayman Chaouki, Jesse Read, Albert Bifet

Figure 1 for Branches: A Fast Dynamic Programming and Branch & Bound Algorithm for Optimal Decision Trees

Figure 2 for Branches: A Fast Dynamic Programming and Branch & Bound Algorithm for Optimal Decision Trees

Figure 3 for Branches: A Fast Dynamic Programming and Branch & Bound Algorithm for Optimal Decision Trees

Figure 4 for Branches: A Fast Dynamic Programming and Branch & Bound Algorithm for Optimal Decision Trees

Abstract:Decision Tree Learning is a fundamental problem for Interpretable Machine Learning, yet it poses a formidable optimization challenge. Despite numerous efforts dating back to the early 1990's, practical algorithms have only recently emerged, primarily leveraging Dynamic Programming (DP) and Branch & Bound (B&B) techniques. These breakthroughs led to the development of two distinct approaches. Algorithms like DL8.5 and MurTree operate on the space of nodes (or branches), they are very fast, but do not penalise complex Decision Trees, i.e. they do not solve for sparsity. On the other hand, algorithms like OSDT and GOSDT operate on the space of Decision Trees, they solve for sparsity but at the detriment of speed. In this work, we introduce Branches, a novel algorithm that integrates the strengths of both paradigms. Leveraging DP and B&B, Branches achieves exceptional speed while also solving for sparsity. Central to its efficiency is a novel analytical bound enabling substantial pruning of the search space. Theoretical analysis demonstrates that Branches has lower complexity compared to state-of-the-art methods, a claim validated through extensive empirical evaluation. Our results illustrate that Branches not only greatly outperforms existing approaches in terms of speed and number of iterations, it also consistently yields optimal Decision Trees.

* This preprint is currently under review

Via

Access Paper or Ask Questions

A Retrospective of the Tutorial on Opportunities and Challenges of Online Deep Learning

May 28, 2024

Cedric Kulbach, Lucas Cazzonelli, Hoang-Anh Ngo, Minh-Huong Le-Nguyen, Albert Bifet

Figure 1 for A Retrospective of the Tutorial on Opportunities and Challenges of Online Deep Learning

Figure 2 for A Retrospective of the Tutorial on Opportunities and Challenges of Online Deep Learning

Figure 3 for A Retrospective of the Tutorial on Opportunities and Challenges of Online Deep Learning

Figure 4 for A Retrospective of the Tutorial on Opportunities and Challenges of Online Deep Learning

Abstract:Machine learning algorithms have become indispensable in today's world. They support and accelerate the way we make decisions based on the data at hand. This acceleration means that data structures that were valid at one moment could no longer be valid in the future. With these changing data structures, it is necessary to adapt machine learning (ML) systems incrementally to the new data. This is done with the use of online learning or continuous ML technologies. While deep learning technologies have shown exceptional performance on predefined datasets, they have not been widely applied to online, streaming, and continuous learning. In this retrospective of our tutorial titled Opportunities and Challenges of Online Deep Learning held at ECML PKDD 2023, we provide a brief overview of the opportunities but also the potential pitfalls for the application of neural networks in online learning environments using the frameworks River and Deep-River.

* Accepted for publication on ECML-PKDD 2023 joint Post-Workshop Proceeding

Via

Access Paper or Ask Questions

Online Learning of Decision Trees with Thompson Sampling

Apr 09, 2024

Ayman Chaouki, Jesse Read, Albert Bifet

Figure 1 for Online Learning of Decision Trees with Thompson Sampling

Figure 2 for Online Learning of Decision Trees with Thompson Sampling

Figure 3 for Online Learning of Decision Trees with Thompson Sampling

Figure 4 for Online Learning of Decision Trees with Thompson Sampling

Abstract:Decision Trees are prominent prediction models for interpretable Machine Learning. They have been thoroughly researched, mostly in the batch setting with a fixed labelled dataset, leading to popular algorithms such as C4.5, ID3 and CART. Unfortunately, these methods are of heuristic nature, they rely on greedy splits offering no guarantees of global optimality and often leading to unnecessarily complex and hard-to-interpret Decision Trees. Recent breakthroughs addressed this suboptimality issue in the batch setting, but no such work has considered the online setting with data arriving in a stream. To this end, we devise a new Monte Carlo Tree Search algorithm, Thompson Sampling Decision Trees (TSDT), able to produce optimal Decision Trees in an online setting. We analyse our algorithm and prove its almost sure convergence to the optimal tree. Furthermore, we conduct extensive experiments to validate our findings empirically. The proposed TSDT outperforms existing algorithms on several benchmarks, all while presenting the practical advantage of being tailored to the online setting.

* To be published in the Proceedings of the 27th International Conference on Artificial Intelligence and Statistics (AISTATS) 2024, Valencia, Spain. PMLR: Volume 238

Via

Access Paper or Ask Questions

Look At Me, No Replay! SurpriseNet: Anomaly Detection Inspired Class Incremental Learning

Oct 30, 2023

Anton Lee, Yaqian Zhang, Heitor Murilo Gomes, Albert Bifet, Bernhard Pfahringer

Figure 1 for Look At Me, No Replay! SurpriseNet: Anomaly Detection Inspired Class Incremental Learning

Figure 2 for Look At Me, No Replay! SurpriseNet: Anomaly Detection Inspired Class Incremental Learning

Abstract:Continual learning aims to create artificial neural networks capable of accumulating knowledge and skills through incremental training on a sequence of tasks. The main challenge of continual learning is catastrophic interference, wherein new knowledge overrides or interferes with past knowledge, leading to forgetting. An associated issue is the problem of learning "cross-task knowledge," where models fail to acquire and retain knowledge that helps differentiate classes across task boundaries. A common solution to both problems is "replay," where a limited buffer of past instances is utilized to learn cross-task knowledge and mitigate catastrophic interference. However, a notable drawback of these methods is their tendency to overfit the limited replay buffer. In contrast, our proposed solution, SurpriseNet, addresses catastrophic interference by employing a parameter isolation method and learning cross-task knowledge using an auto-encoder inspired by anomaly detection. SurpriseNet is applicable to both structured and unstructured data, as it does not rely on image-specific inductive biases. We have conducted empirical experiments demonstrating the strengths of SurpriseNet on various traditional vision continual-learning benchmarks, as well as on structured data datasets. Source code made available at https://doi.org/10.5281/zenodo.8247906 and https://github.com/tachyonicClock/SurpriseNet-CIKM-23

* Proceedings of the 32nd ACM international conference on information and knowledge management, CIKM 2023, birmingham, united kingdom, october 21-25, 2023

Via

Access Paper or Ask Questions