Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Vincent Lemaire

LPSM

Deep Reinforcement Learning based Triggering Function for Early Classifiers of Time Series

Feb 10, 2025

Aurélien Renault, Alexis Bondu, Antoine Cornuéjols, Vincent Lemaire

Figure 1 for Deep Reinforcement Learning based Triggering Function for Early Classifiers of Time Series

Figure 2 for Deep Reinforcement Learning based Triggering Function for Early Classifiers of Time Series

Figure 3 for Deep Reinforcement Learning based Triggering Function for Early Classifiers of Time Series

Figure 4 for Deep Reinforcement Learning based Triggering Function for Early Classifiers of Time Series

Abstract:Early Classification of Time Series (ECTS) has been recognized as an important problem in many areas where decisions have to be taken as soon as possible, before the full data availability, while time pressure increases. Numerous ECTS approaches have been proposed, based on different triggering functions, each taking into account various pieces of information related to the incoming time series and/or the output of a classifier. Although their performances have been empirically compared in the literature, no studies have been carried out on the optimality of these triggering functions that involve ``man-tailored'' decision rules. Based on the same information, could there be better triggering functions? This paper presents one way to investigate this question by showing first how to translate ECTS problems into Reinforcement Learning (RL) ones, where the very same information is used in the state space. A thorough comparison of the performance obtained by ``handmade'' approaches and their ``RL-based'' counterparts has been carried out. A second question investigated in this paper is whether a different combination of information, defining the state space in RL systems, can achieve even better performance. Experiments show that the system we describe, called \textsc{Alert}, significantly outperforms its state-of-the-art competitors on a large number of datasets.

Via

Access Paper or Ask Questions

Unsupervised Feature Construction for Anomaly Detection in Time Series -- An Evaluation

Jan 14, 2025

Marine Hamon, Vincent Lemaire, Nour Eddine Yassine Nair-Benrekia, Samuel Berlemont, Julien Cumin

Abstract:To detect anomalies with precision and without prior knowledge in time series, is it better to build a detector from the initial temporal representation, or to compute a new (tabular) representation using an existing automatic variable construction library? In this article, we address this question by conducting an in-depth experimental study for two popular detectors (Isolation Forest and Local Outlier Factor). The obtained results, for 5 different datasets, show that the new representation, computed using the tsfresh library, allows Isolation Forest to significantly improve its performance.

* 7

Via

Access Paper or Ask Questions

A new Input Convex Neural Network with application to options pricing

Nov 19, 2024

Vincent Lemaire, Gilles Pagès, Christian Yeo

Abstract:We introduce a new class of neural networks designed to be convex functions of their inputs, leveraging the principle that any convex function can be represented as the supremum of the affine functions it dominates. These neural networks, inherently convex with respect to their inputs, are particularly well-suited for approximating the prices of options with convex payoffs. We detail the architecture of this, and establish theoretical convergence bounds that validate its approximation capabilities. We also introduce a \emph{scrambling} phase to improve the training of these networks. Finally, we demonstrate numerically the effectiveness of these networks in estimating prices for three types of options with convex payoffs: Basket, Bermudan, and Swing options.

* 29 pages

Via

Access Paper or Ask Questions

Mislabeled examples detection viewed as probing machine learning models: concepts, survey and extensive benchmark

Oct 21, 2024

Thomas George, Pierre Nodet, Alexis Bondu, Vincent Lemaire

Figure 1 for Mislabeled examples detection viewed as probing machine learning models: concepts, survey and extensive benchmark

Figure 2 for Mislabeled examples detection viewed as probing machine learning models: concepts, survey and extensive benchmark

Figure 3 for Mislabeled examples detection viewed as probing machine learning models: concepts, survey and extensive benchmark

Figure 4 for Mislabeled examples detection viewed as probing machine learning models: concepts, survey and extensive benchmark

Abstract:Mislabeled examples are ubiquitous in real-world machine learning datasets, advocating the development of techniques for automatic detection. We show that most mislabeled detection methods can be viewed as probing trained machine learning models using a few core principles. We formalize a modular framework that encompasses these methods, parameterized by only 4 building blocks, as well as a Python library that demonstrates that these principles can actually be implemented. The focus is on classifier-agnostic concepts, with an emphasis on adapting methods developed for deep learning models to non-deep classifiers for tabular data. We benchmark existing methods on (artificial) Completely At Random (NCAR) as well as (realistic) Not At Random (NNAR) labeling noise from a variety of tasks with imperfect labeling rules. This benchmark provides new insights as well as limitations of existing methods in this setup.

* Transactions on Machine Learning Research 2024

Via

Access Paper or Ask Questions

DEMAU: Decompose, Explore, Model and Analyse Uncertainties

Sep 12, 2024

Arthur Hoarau, Vincent Lemaire

Figure 1 for DEMAU: Decompose, Explore, Model and Analyse Uncertainties

Figure 2 for DEMAU: Decompose, Explore, Model and Analyse Uncertainties

Figure 3 for DEMAU: Decompose, Explore, Model and Analyse Uncertainties

Abstract:Recent research in machine learning has given rise to a flourishing literature on the quantification and decomposition of model uncertainty. This information can be very useful during interactions with the learner, such as in active learning or adaptive learning, and especially in uncertainty sampling. To allow a simple representation of these total, epistemic (reducible) and aleatoric (irreducible) uncertainties, we offer DEMAU, an open-source educational, exploratory and analytical tool allowing to visualize and explore several types of uncertainty for classification models in machine learning.

Via

Access Paper or Ask Questions

ml_edm package: a Python toolkit for Machine Learning based Early Decision Making

Aug 23, 2024

Aurélien Renault, Youssef Achenchabe, Édouard Bertrand, Alexis Bondu, Antoine Cornuéjols, Vincent Lemaire, Asma Dachraoui

Abstract:\texttt{ml\_edm} is a Python 3 library, designed for early decision making of any learning tasks involving temporal/sequential data. The package is also modular, providing researchers an easy way to implement their own triggering strategy for classification, regression or any machine learning task. As of now, many Early Classification of Time Series (ECTS) state-of-the-art algorithms, are efficiently implemented in the library leveraging parallel computation. The syntax follows the one introduce in \texttt{scikit-learn}, making estimators and pipelines compatible with \texttt{ml\_edm}. This software is distributed over the BSD-3-Clause license, source code can be found at \url{https://github.com/ML-EDM/ml_edm}.

Via

Access Paper or Ask Questions

Early Classification of Time Series: Taxonomy and Benchmark

Jun 26, 2024

Aurélien Renault, Alexis Bondu, Antoine Cornuéjols, Vincent Lemaire

Figure 1 for Early Classification of Time Series: Taxonomy and Benchmark

Figure 2 for Early Classification of Time Series: Taxonomy and Benchmark

Figure 3 for Early Classification of Time Series: Taxonomy and Benchmark

Figure 4 for Early Classification of Time Series: Taxonomy and Benchmark

Abstract:In many situations, the measurements of a studied phenomenon are provided sequentially, and the prediction of its class needs to be made as early as possible so as not to incur too high a time penalty, but not too early and risk paying the cost of misclassification. This problem has been particularly studied in the case of time series, and is known as Early Classification of Time Series (ECTS). Although it has been the subject of a growing body of literature, there is still a lack of a systematic, shared evaluation protocol to compare the relative merits of the various existing methods. This document begins by situating these methods within a principle-based taxonomy. It defines dimensions for organizing their evaluation, and then reports the results of a very extensive set of experiments along these dimensions involving nine state-of-the art ECTS algorithms. In addition, these and other experiments can be carried out using an open-source library in which most of the existing ECTS algorithms have been implemented (see \url{https://github.com/ML-EDM/ml_edm}).

Via

Access Paper or Ask Questions

Constructing Variables Using Classifiers as an Aid to Regression: An Empirical Assessment

Mar 13, 2024

Colin Troisemaine, Vincent Lemaire

Abstract:This paper proposes a method for the automatic creation of variables (in the case of regression) that complement the information contained in the initial input vector. The method works as a pre-processing step in which the continuous values of the variable to be regressed are discretized into a set of intervals which are then used to define value thresholds. Then classifiers are trained to predict whether the value to be regressed is less than or equal to each of these thresholds. The different outputs of the classifiers are then concatenated in the form of an additional vector of variables that enriches the initial vector of the regression problem. The implemented system can thus be considered as a generic pre-processing tool. We tested the proposed enrichment method with 5 types of regressors and evaluated it in 33 regression datasets. Our experimental results confirm the interest of the approach.

Via

Access Paper or Ask Questions

An analysis of the noise schedule for score-based generative models

Feb 07, 2024

Stanislas Strasman, Antonio Ocello, Claire Boyer, Sylvain Le Corff, Vincent Lemaire

Abstract:Score-based generative models (SGMs) aim at estimating a target data distribution by learning score functions using only noise-perturbed samples from the target. Recent literature has focused extensively on assessing the error between the target and estimated distributions, gauging the generative quality through the Kullback-Leibler (KL) divergence and Wasserstein distances. All existing results have been obtained so far for time-homogeneous speed of the noise schedule. Under mild assumptions on the data distribution, we establish an upper bound for the KL divergence between the target and the estimated distributions, explicitly depending on any time-dependent noise schedule. Assuming that the score is Lipschitz continuous, we provide an improved error bound in Wasserstein distance, taking advantage of favourable underlying contraction mechanisms. We also propose an algorithm to automatically tune the noise schedule using the proposed upper bound. We illustrate empirically the performance of the noise schedule optimization in comparison to standard choices in the literature.

Via

Access Paper or Ask Questions

A Practical Approach to Novel Class Discovery in Tabular Data

Nov 09, 2023

Colin Troisemaine, Alexandre Reiffers-Masson, Stéphane Gosselin, Vincent Lemaire, Sandrine Vaton

Abstract:The problem of Novel Class Discovery (NCD) consists in extracting knowledge from a labeled set of known classes to accurately partition an unlabeled set of novel classes. While NCD has recently received a lot of attention from the community, it is often solved on computer vision problems and under unrealistic conditions. In particular, the number of novel classes is usually assumed to be known in advance, and their labels are sometimes used to tune hyperparameters. Methods that rely on these assumptions are not applicable in real-world scenarios. In this work, we focus on solving NCD in tabular data when no prior knowledge of the novel classes is available. To this end, we propose to tune the hyperparameters of NCD methods by adapting the $k$-fold cross-validation process and hiding some of the known classes in each fold. Since we have found that methods with too many hyperparameters are likely to overfit these hidden classes, we define a simple deep NCD model. This method is composed of only the essential elements necessary for the NCD problem and performs impressively well under realistic conditions. Furthermore, we find that the latent space of this method can be used to reliably estimate the number of novel classes. Additionally, we adapt two unsupervised clustering algorithms ($k$-means and Spectral Clustering) to leverage the knowledge of the known classes. Extensive experiments are conducted on 7 tabular datasets and demonstrate the effectiveness of the proposed method and hyperparameter tuning process, and show that the NCD problem can be solved without relying on knowledge from the novel classes.

* 25 pages, including 3 pages of annexes

Via

Access Paper or Ask Questions