Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Andrii Babii

Functional Partial Least-Squares: Optimal Rates and Adaptation

Feb 16, 2024

Andrii Babii, Marine Carrasco, Idriss Tsafack

Abstract:We consider the functional linear regression model with a scalar response and a Hilbert space-valued predictor, a well-known ill-posed inverse problem. We propose a new formulation of the functional partial least-squares (PLS) estimator related to the conjugate gradient method. We shall show that the estimator achieves the (nearly) optimal convergence rate on a class of ellipsoids and we introduce an early stopping rule which adapts to the unknown degree of ill-posedness. Some theoretical and simulation comparison between the estimator and the principal component regression estimator is provided.

Via

Access Paper or Ask Questions

Econometrics of Machine Learning Methods in Economic Forecasting

Aug 21, 2023

Andrii Babii, Eric Ghysels, Jonas Striaukas

Figure 1 for Econometrics of Machine Learning Methods in Economic Forecasting

Figure 2 for Econometrics of Machine Learning Methods in Economic Forecasting

Abstract:This paper surveys the recent advances in machine learning method for economic forecasting. The survey covers the following topics: nowcasting, textual data, panel and tensor data, high-dimensional Granger causality tests, time series cross-validation, classification with economic losses.

Via

Access Paper or Ask Questions

Panel Data Nowcasting: The Case of Price-Earnings Ratios

Jul 05, 2023

Andrii Babii, Ryan T. Ball, Eric Ghysels, Jonas Striaukas

Abstract:The paper uses structured machine learning regressions for nowcasting with panel data consisting of series sampled at different frequencies. Motivated by the problem of predicting corporate earnings for a large cross-section of firms with macroeconomic, financial, and news time series sampled at different frequencies, we focus on the sparse-group LASSO regularization which can take advantage of the mixed frequency time series panel data structures. Our empirical results show the superior performance of our machine learning panel data regression models over analysts' predictions, forecast combinations, firm-specific time series regression models, and standard machine learning methods.

* arXiv admin note: substantial text overlap with arXiv:2008.03600

Via

Access Paper or Ask Questions

Tensor Principal Component Analysis

Dec 26, 2022

Andrii Babii, Eric Ghysels, Junsu Pan

Abstract:In this paper, we develop new methods for analyzing high-dimensional tensor datasets. A tensor factor model describes a high-dimensional dataset as a sum of a low-rank component and an idiosyncratic noise, generalizing traditional factor models for panel data. We propose an estimation algorithm, called tensor principal component analysis (PCA), which generalizes the traditional PCA applicable to panel data. The algorithm involves unfolding the tensor into a sequence of matrices along different dimensions and applying PCA to the unfolded matrices. We provide theoretical results on the consistency and asymptotic distribution for tensor PCA estimator of loadings and factors. The algorithm demonstrates good performance in Mote Carlo experiments and is applied to sorted portfolios.

Via

Access Paper or Ask Questions

Binary Choice with Asymmetric Loss in a Data-Rich Environment: Theory and an Application to Racial Justice

Oct 25, 2020

Andrii Babii, Xi Chen, Eric Ghysels, Rohit Kumar

Figure 1 for Binary Choice with Asymmetric Loss in a Data-Rich Environment: Theory and an Application to Racial Justice

Figure 2 for Binary Choice with Asymmetric Loss in a Data-Rich Environment: Theory and an Application to Racial Justice

Figure 3 for Binary Choice with Asymmetric Loss in a Data-Rich Environment: Theory and an Application to Racial Justice

Figure 4 for Binary Choice with Asymmetric Loss in a Data-Rich Environment: Theory and an Application to Racial Justice

Abstract:The importance of asymmetries in prediction problems arising in economics has been recognized for a long time. In this paper, we focus on binary choice problems in a data-rich environment with general loss functions. In contrast to the asymmetric regression problems, the binary choice with general loss functions and high-dimensional datasets is challenging and not well understood. Econometricians have studied binary choice problems for a long time, but the literature does not offer computationally attractive solutions in data-rich environments. In contrast, the machine learning literature has many computationally attractive algorithms that form the basis for much of the automated procedures that are implemented in practice, but it is focused on symmetric loss functions that are independent of individual characteristics. One of the main contributions of our paper is to show that the theoretically valid predictions of binary outcomes with arbitrary loss functions can be achieved via a very simple reweighting of the logistic regression, or other state-of-the-art machine learning techniques, such as boosting or (deep) neural networks. We apply our analysis to racial justice in pretrial detention.

Via

Access Paper or Ask Questions

Machine Learning Panel Data Regressions with an Application to Nowcasting Price Earnings Ratios

Aug 08, 2020

Andrii Babii, Ryan T. Ball, Eric Ghysels, Jonas Striaukas

Figure 1 for Machine Learning Panel Data Regressions with an Application to Nowcasting Price Earnings Ratios

Figure 2 for Machine Learning Panel Data Regressions with an Application to Nowcasting Price Earnings Ratios

Figure 3 for Machine Learning Panel Data Regressions with an Application to Nowcasting Price Earnings Ratios

Abstract:This paper introduces structured machine learning regressions for prediction and nowcasting with panel data consisting of series sampled at different frequencies. Motivated by the empirical problem of predicting corporate earnings for a large cross-section of firms with macroeconomic, financial, and news time series sampled at different frequencies, we focus on the sparse-group LASSO regularization. This type of regularization can take advantage of the mixed frequency time series panel data structures and we find that it empirically outperforms the unstructured machine learning methods. We obtain oracle inequalities for the pooled and fixed effects sparse-group LASSO panel data estimators recognizing that financial and economic data exhibit heavier than Gaussian tails. To that end, we leverage on a novel Fuk-Nagaev concentration inequality for panel data consisting of heavy-tailed $\tau$-mixing processes which may be of independent interest in other high-dimensional panel data settings.

Via

Access Paper or Ask Questions

Machine learning time series regressions with an application to nowcasting

May 29, 2020

Andrii Babii, Eric Ghysels, Jonas Striaukas

Figure 1 for Machine learning time series regressions with an application to nowcasting

Figure 2 for Machine learning time series regressions with an application to nowcasting

Figure 3 for Machine learning time series regressions with an application to nowcasting

Figure 4 for Machine learning time series regressions with an application to nowcasting

Abstract:This paper introduces structured machine learning regressions for high-dimensional time series data potentially sampled at different frequencies. The sparse-group LASSO estimator can take advantage of such time series data structures and outperforms the unstructured LASSO. We establish oracle inequalities for the sparse-group LASSO estimator within a framework that allows for the mixing processes and recognizes that the financial and the macroeconomic data may have heavier than exponential tails. An empirical application to nowcasting US GDP growth indicates that the estimator performs favorably compared to other alternatives and that the text data can be a useful addition to more traditional numerical data.

* 25 pages, plus appendix. Portions of this work previously appeared as arXiv:1912.06307v1 which has been split into two articles

Via

Access Paper or Ask Questions

High-dimensional mixed-frequency IV regression

Mar 30, 2020

Andrii Babii

Figure 1 for High-dimensional mixed-frequency IV regression

Figure 2 for High-dimensional mixed-frequency IV regression

Figure 3 for High-dimensional mixed-frequency IV regression

Figure 4 for High-dimensional mixed-frequency IV regression

Abstract:This paper introduces a high-dimensional linear IV regression for the data sampled at mixed frequencies. We show that the high-dimensional slope parameter of a high-frequency covariate can be identified and accurately estimated leveraging on a low-frequency instrumental variable. The distinguishing feature of the model is that it allows handing high-dimensional datasets without imposing the approximate sparsity restrictions. We propose a Tikhonov-regularized estimator and derive the convergence rate of its mean-integrated squared error for time series data. The estimator has a closed-form expression that is easy to compute and demonstrates excellent performance in our Monte Carlo experiments. We estimate the real-time price elasticity of supply on the Australian electricity spot market. Our estimates suggest that the supply is relatively inelastic and that its elasticity is heterogeneous throughout the day.

Via

Access Paper or Ask Questions

Estimation and HAC-based Inference for Machine Learning Time Series Regressions

Dec 13, 2019

Andrii Babii, Eric Ghysels, Jonas Striaukas

Figure 1 for Estimation and HAC-based Inference for Machine Learning Time Series Regressions

Figure 2 for Estimation and HAC-based Inference for Machine Learning Time Series Regressions

Figure 3 for Estimation and HAC-based Inference for Machine Learning Time Series Regressions

Figure 4 for Estimation and HAC-based Inference for Machine Learning Time Series Regressions

Abstract:Time series regression analysis in econometrics typically involves a framework relying on a set of mixing conditions to establish consistency and asymptotic normality of parameter estimates and HAC-type estimators of the residual long-run variances to conduct proper inference. This article introduces structured machine learning regressions for high-dimensional time series data using the aforementioned commonly used setting. To recognize the time series data structures we rely on the sparse-group LASSO estimator. We derive a new Fuk-Nagaev inequality for a class of $\tau$-dependent processes with heavier than Gaussian tails, nesting $\alpha$-mixing processes as a special case, and establish estimation, prediction, and inferential properties, including convergence rates of the HAC estimator for the long-run variance based on LASSO residuals. An empirical application to nowcasting US GDP growth indicates that the estimator performs favorably compared to other alternatives and that the text data can be a useful addition to more traditional numerical data.

Via

Access Paper or Ask Questions