Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

William Kengne

Deep regression learning from dependent observations with minimum error entropy principle

Mar 11, 2026

William Kengne, Modou Wade

Abstract:This paper considers nonparametric regression from strongly mixing observations. The proposed approach is based on deep neural networks with minimum error entropy (MEE) principle. We study two estimators: the non-penalized deep neural network (NPDNN) and the sparse-penalized deep neural network (SPDNN) predictors. Upper bounds of the expected excess risk are established for both estimators over the classes of Hölder and composition Hölder functions. For the models with Gaussian error, the rates of the upper bound obtained match (up to a logarithmic factor) with the lower bounds established in \cite{schmidt2020nonparametric}, showing that both the MEE-based NPDNN and SPDNN estimators from strongly mixing data can achieve the minimax optimal convergence rate.

Via

Access Paper or Ask Questions

A general framework for deep learning

Dec 29, 2025

William Kengne, Modou Wade

Abstract:This paper develops a general approach for deep learning for a setting that includes nonparametric regression and classification. We perform a framework from data that fulfills a generalized Bernstein-type inequality, including independent, $φ$-mixing, strongly mixing and $\mathcal{C}$-mixing observations. Two estimators are proposed: a non-penalized deep neural network estimator (NPDNN) and a sparse-penalized deep neural network estimator (SPDNN). For each of these estimators, bounds of the expected excess risk on the class of Hölder smooth functions and composition Hölder functions are established. Applications to independent data, as well as to $φ$-mixing, strongly mixing, $\mathcal{C}$-mixing processes are considered. For each of these examples, the upper bounds of the expected excess risk of the proposed NPDNN and SPDNN predictors are derived. It is shown that both the NPDNN and SPDNN estimators are minimax optimal (up to a logarithmic factor) in many classical settings.

Via

Access Paper or Ask Questions

Minimax optimality of deep neural networks on dependent data via PAC-Bayes bounds

Oct 29, 2024

Pierre Alquier, William Kengne

Abstract:In a groundbreaking work, Schmidt-Hieber (2020) proved the minimax optimality of deep neural networks with ReLu activation for least-square regression estimation over a large class of functions defined by composition. In this paper, we extend these results in many directions. First, we remove the i.i.d. assumption on the observations, to allow some time dependence. The observations are assumed to be a Markov chain with a non-null pseudo-spectral gap. Then, we study a more general class of machine learning problems, which includes least-square and logistic regression as special cases. Leveraging on PAC-Bayes oracle inequalities and a version of Bernstein inequality due to Paulin (2015), we derive upper bounds on the estimation risk for a generalized Bayesian estimator. In the case of least-square regression, this bound matches (up to a logarithmic factor) the lower bound of Schmidt-Hieber (2020). We establish a similar lower bound for classification with the logistic loss, and prove that the proposed DNN estimator is optimal in the minimax sense.

Via

Access Paper or Ask Questions

Deep learning from strongly mixing observations: Sparse-penalized regularization and minimax optimality

Jun 12, 2024

William Kengne, Modou Wade

Abstract:The explicit regularization and optimality of deep neural networks estimators from independent data have made considerable progress recently. The study of such properties on dependent data is still a challenge. In this paper, we carry out deep learning from strongly mixing observations, and deal with the squared and a broad class of loss functions. We consider sparse-penalized regularization for deep neural network predictor. For a general framework that includes, regression estimation, classification, time series prediction,$\cdots$, oracle inequality for the expected excess risk is established and a bound on the class of H\"older smooth functions is provided. For nonparametric regression from strong mixing data and sub-exponentially error, we provide an oracle inequality for the $L_2$ error and investigate an upper bound of this error on a class of H\"older composition functions. For the specific case of nonparametric autoregression with Gaussian and Laplace errors, a lower bound of the $L_2$ error on this H\"older composition class is established. Up to logarithmic factor, this bound matches its upper bound; so, the deep neural network estimator attains the minimax optimal rate.

Via

Access Paper or Ask Questions

Robust deep learning from weakly dependent data

May 08, 2024

William Kengne, Modou Wade

Figure 1 for Robust deep learning from weakly dependent data

Figure 2 for Robust deep learning from weakly dependent data

Figure 3 for Robust deep learning from weakly dependent data

Figure 4 for Robust deep learning from weakly dependent data

Abstract:Recent developments on deep learning established some theoretical properties of deep neural networks estimators. However, most of the existing works on this topic are restricted to bounded loss functions or (sub)-Gaussian or bounded input. This paper considers robust deep learning from weakly dependent observations, with unbounded loss function and unbounded input/output. It is only assumed that the output variable has a finite $r$ order moment, with $r >1$. Non asymptotic bounds for the expected excess risk of the deep neural network estimator are established under strong mixing, and $\psi$-weak dependence assumptions on the observations. We derive a relationship between these bounds and $r$, and when the data have moments of any order (that is $r=\infty$), the convergence rate is close to some well-known results. When the target predictor belongs to the class of H\"older smooth functions with sufficiently large smoothness index, the rate of the expected excess risk for exponentially strongly mixing data is close to or as same as those for obtained with i.i.d. samples. Application to robust nonparametric regression and robust nonparametric autoregression are considered. The simulation study for models with heavy-tailed errors shows that, robust estimators with absolute loss and Huber loss function outperform the least squares method.

Via

Access Paper or Ask Questions

Penalized deep neural networks estimator with general loss functions under weak dependence

May 10, 2023

William Kengne, Modou Wade

Abstract:This paper carries out sparse-penalized deep neural networks predictors for learning weakly dependent processes, with a broad class of loss functions. We deal with a general framework that includes, regression estimation, classification, times series prediction, $\cdots$ The $\psi$-weak dependence structure is considered, and for the specific case of bounded observations, $\theta_\infty$-coefficients are also used. In this case of $\theta_\infty$-weakly dependent, a non asymptotic generalization bound within the class of deep neural networks predictors is provided. For learning both $\psi$ and $\theta_\infty$-weakly dependent processes, oracle inequalities for the excess risk of the sparse-penalized deep neural networks estimators are established. When the target function is sufficiently smooth, the convergence rate of these excess risk is close to $\mathcal{O}(n^{-1/3})$. Some simulation results are provided, and application to the forecast of the particulate matter in the Vit\'{o}ria metropolitan area is also considered.

Via

Access Paper or Ask Questions

Sparse-penalized deep neural networks estimator under weak dependence

Mar 02, 2023

William Kengne, Modou Wade

Abstract:We consider the nonparametric regression and the classification problems for $\psi$-weakly dependent processes. This weak dependence structure is more general than conditions such as, mixing, association, $\ldots$. A penalized estimation method for sparse deep neural networks is performed. In both nonparametric regression and binary classification problems, we establish oracle inequalities for the excess risk of the sparse-penalized deep neural networks estimators. Convergence rates of the excess risk of these estimators are also derived. The simulation results displayed show that, the proposed estimators overall work well than the non penalized estimators.

Via

Access Paper or Ask Questions

Excess risk bound for deep learning under weak dependence

Feb 15, 2023

William Kengne

Abstract:This paper considers deep neural networks for learning weakly dependent processes in a general framework that includes, for instance, regression estimation, time series prediction, time series classification. The $\psi$-weak dependence structure considered is quite large and covers other conditions such as mixing, association,$\ldots$ Firstly, the approximation of smooth functions by deep neural networks with a broad class of activation functions is considered. We derive the required depth, width and sparsity of a deep neural network to approximate any H\"{o}lder smooth function, defined on any compact set $\mx$. Secondly, we establish a bound of the excess risk for the learning of weakly dependent observations by deep neural networks. When the target function is sufficiently smooth, this bound is close to the usual $\mathcal{O}(n^{-1/2})$.

Via

Access Paper or Ask Questions

Deep learning for $ψ$-weakly dependent processes

Feb 01, 2023

William Kengne, Wade Modou

Abstract:In this paper, we perform deep neural networks for learning $\psi$-weakly dependent processes. Such weak-dependence property includes a class of weak dependence conditions such as mixing, association,$\cdots$ and the setting considered here covers many commonly used situations such as: regression estimation, time series prediction, time series classification,$\cdots$ The consistency of the empirical risk minimization algorithm in the class of deep neural networks predictors is established. We achieve the generalization bound and obtain a learning rate, which is less than $\mathcal{O}(n^{-1/\alpha})$, for all $\alpha > 2 $. Applications to binary time series classification and prediction in affine causal models with exogenous covariates are carried out. Some simulation results are provided, as well as an application to the US recession data.

Via

Access Paper or Ask Questions