Alert button
Picture for Rajat Sen

Rajat Sen

Alert button

Linear Regression using Heterogeneous Data Batches

Sep 05, 2023
Ayush Jain, Rajat Sen, Weihao Kong, Abhimanyu Das, Alon Orlitsky

In many learning applications, data are collected from multiple sources, each providing a \emph{batch} of samples that by itself is insufficient to learn its input-output relationship. A common approach assumes that the sources fall in one of several unknown subgroups, each with an unknown input distribution and input-output relationship. We consider one of this setup's most fundamental and important manifestations where the output is a noisy linear combination of the inputs, and there are $k$ subgroups, each with its own regression vector. Prior work~\cite{kong2020meta} showed that with abundant small-batches, the regression vectors can be learned with only few, $\tilde\Omega( k^{3/2})$, batches of medium-size with $\tilde\Omega(\sqrt k)$ samples each. However, the paper requires that the input distribution for all $k$ subgroups be isotropic Gaussian, and states that removing this assumption is an ``interesting and challenging problem". We propose a novel gradient-based algorithm that improves on the existing results in several ways. It extends the applicability of the algorithm by: (1) allowing the subgroups' underlying input distributions to be different, unknown, and heavy-tailed; (2) recovering all subgroups followed by a significant proportion of batches even for infinite $k$; (3) removing the separation requirement between the regression vectors; (4) reducing the number of batches and allowing smaller batch sizes.

Viaarxiv icon

Long-term Forecasting with TiDE: Time-series Dense Encoder

Apr 27, 2023
Abhimanyu Das, Weihao Kong, Andrew Leach, Shaan Mathur, Rajat Sen, Rose Yu

Figure 1 for Long-term Forecasting with TiDE: Time-series Dense Encoder
Figure 2 for Long-term Forecasting with TiDE: Time-series Dense Encoder
Figure 3 for Long-term Forecasting with TiDE: Time-series Dense Encoder
Figure 4 for Long-term Forecasting with TiDE: Time-series Dense Encoder

Recent work has shown that simple linear models can outperform several Transformer based approaches in long term time-series forecasting. Motivated by this, we propose a Multi-layer Perceptron (MLP) based encoder-decoder model, Time-series Dense Encoder (TiDE), for long-term time-series forecasting that enjoys the simplicity and speed of linear models while also being able to handle covariates and non-linear dependencies. Theoretically, we prove that the simplest linear analogue of our model can achieve near optimal error rate for linear dynamical systems (LDS) under some assumptions. Empirically, we show that our method can match or outperform prior approaches on popular long-term time-series forecasting benchmarks while being 5-10x faster than the best Transformer based model.

Viaarxiv icon

Efficient List-Decodable Regression using Batches

Nov 23, 2022
Abhimanyu Das, Ayush Jain, Weihao Kong, Rajat Sen

We begin the study of list-decodable linear regression using batches. In this setting only an $\alpha \in (0,1]$ fraction of the batches are genuine. Each genuine batch contains $\ge n$ i.i.d. samples from a common unknown distribution and the remaining batches may contain arbitrary or even adversarial samples. We derive a polynomial time algorithm that for any $n\ge \tilde \Omega(1/\alpha)$ returns a list of size $\mathcal O(1/\alpha^2)$ such that one of the items in the list is close to the true regression parameter. The algorithm requires only $\tilde{\mathcal{O}}(d/\alpha^2)$ genuine batches and works under fairly general assumptions on the distribution. The results demonstrate the utility of batch structure, which allows for the first polynomial time algorithm for list-decodable regression, which may be impossible for the non-batch setting, as suggested by a recent SQ lower bound \cite{diakonikolas2021statistical} for the non-batch setting.

* First draft 
Viaarxiv icon

Trimmed Maximum Likelihood Estimation for Robust Learning in Generalized Linear Models

Jun 09, 2022
Weihao Kong, Rajat Sen, Pranjal Awasthi, Abhimanyu Das

We study the problem of learning generalized linear models under adversarial corruptions. We analyze a classical heuristic called the iterative trimmed maximum likelihood estimator which is known to be effective against label corruptions in practice. Under label corruptions, we prove that this simple estimator achieves minimax near-optimal risk on a wide range of generalized linear models, including Gaussian regression, Poisson regression and Binomial regression. Finally, we extend the estimator to the more challenging setting of label and covariate corruptions and demonstrate its robustness and optimality in that setting as well.

Viaarxiv icon

On Learning Mixture of Linear Regressions in the Non-Realizable Setting

May 26, 2022
Avishek Ghosh, Arya Mazumdar, Soumyabrata Pal, Rajat Sen

Figure 1 for On Learning Mixture of Linear Regressions in the Non-Realizable Setting
Figure 2 for On Learning Mixture of Linear Regressions in the Non-Realizable Setting
Figure 3 for On Learning Mixture of Linear Regressions in the Non-Realizable Setting
Figure 4 for On Learning Mixture of Linear Regressions in the Non-Realizable Setting

While mixture of linear regressions (MLR) is a well-studied topic, prior works usually do not analyze such models for prediction error. In fact, {\em prediction} and {\em loss} are not well-defined in the context of mixtures. In this paper, first we show that MLR can be used for prediction where instead of predicting a label, the model predicts a list of values (also known as {\em list-decoding}). The list size is equal to the number of components in the mixture, and the loss function is defined to be minimum among the losses resulted by all the component models. We show that with this definition, a solution of the empirical risk minimization (ERM) achieves small probability of prediction error. This begs for an algorithm to minimize the empirical risk for MLR, which is known to be computationally hard. Prior algorithmic works in MLR focus on the {\em realizable} setting, i.e., recovery of parameters when data is probabilistically generated by a mixed linear (noisy) model. In this paper we show that a version of the popular alternating minimization (AM) algorithm finds the best fit lines in a dataset even when a realizable model is not assumed, under some regularity conditions on the dataset and the initial points, and thereby provides a solution for the ERM. We further provide an algorithm that runs in polynomial time in the number of datapoints, and recovers a good approximation of the best fit lines. The two algorithms are experimentally compared.

* To appear in ICML 2022 
Viaarxiv icon

A Top-Down Approach to Hierarchically Coherent Probabilistic Forecasting

Apr 21, 2022
Abhimanyu Das, Weihao Kong, Biswajit Paria, Rajat Sen

Figure 1 for A Top-Down Approach to Hierarchically Coherent Probabilistic Forecasting
Figure 2 for A Top-Down Approach to Hierarchically Coherent Probabilistic Forecasting
Figure 3 for A Top-Down Approach to Hierarchically Coherent Probabilistic Forecasting
Figure 4 for A Top-Down Approach to Hierarchically Coherent Probabilistic Forecasting

Hierarchical forecasting is a key problem in many practical multivariate forecasting applications - the goal is to obtain coherent predictions for a large number of correlated time series that are arranged in a pre-specified tree hierarchy. In this paper, we present a probabilistic top-down approach to hierarchical forecasting that uses a novel attention-based RNN model to learn the distribution of the proportions according to which each parent prediction is split among its children nodes at any point in time. These probabilistic proportions are then coupled with an independent univariate probabilistic forecasting model (such as Prophet or STS) for the root time series. The resulting forecasts are computed in a top-down fashion and are naturally coherent, and also support probabilistic predictions over all time series in the hierarchy. We provide theoretical justification for the superiority of our top-down approach compared to traditional bottom-up hierarchical modeling. Finally, we experiment on three public datasets and demonstrate significantly improved probabilistic forecasts, compared to state-of-the-art probabilistic hierarchical models.

Viaarxiv icon

Cluster-and-Conquer: A Framework For Time-Series Forecasting

Oct 26, 2021
Reese Pathak, Rajat Sen, Nikhil Rao, N. Benjamin Erichson, Michael I. Jordan, Inderjit S. Dhillon

Figure 1 for Cluster-and-Conquer: A Framework For Time-Series Forecasting
Figure 2 for Cluster-and-Conquer: A Framework For Time-Series Forecasting
Figure 3 for Cluster-and-Conquer: A Framework For Time-Series Forecasting
Figure 4 for Cluster-and-Conquer: A Framework For Time-Series Forecasting

We propose a three-stage framework for forecasting high-dimensional time-series data. Our method first estimates parameters for each univariate time series. Next, we use these parameters to cluster the time series. These clusters can be viewed as multivariate time series, for which we then compute parameters. The forecasted values of a single time series can depend on the history of other time series in the same cluster, accounting for intra-cluster similarity while minimizing potential noise in predictions by ignoring inter-cluster effects. Our framework -- which we refer to as "cluster-and-conquer" -- is highly general, allowing for any time-series forecasting and clustering method to be used in each step. It is computationally efficient and embarrassingly parallel. We motivate our framework with a theoretical analysis in an idealized mixed linear regression setting, where we provide guarantees on the quality of the estimates. We accompany these guarantees with experimental results that demonstrate the advantages of our framework: when instantiated with simple linear autoregressive models, we are able to achieve state-of-the-art results on several benchmark datasets, sometimes outperforming deep-learning-based approaches.

* 25 pages, 3 figures 
Viaarxiv icon

On the benefits of maximum likelihood estimation for Regression and Forecasting

Jun 18, 2021
Pranjal Awasthi, Abhimanyu Das, Rajat Sen, Ananda Theertha Suresh

Figure 1 for On the benefits of maximum likelihood estimation for Regression and Forecasting
Figure 2 for On the benefits of maximum likelihood estimation for Regression and Forecasting

We advocate for a practical Maximum Likelihood Estimation (MLE) approach for regression and forecasting, as an alternative to the typical approach of Empirical Risk Minimization (ERM) for a specific target metric. This approach is better suited to capture inductive biases such as prior domain knowledge in datasets, and can output post-hoc estimators at inference time that can optimize different types of target metrics. We present theoretical results to demonstrate that our approach is always competitive with any estimator for the target metric under some general conditions, and in many practical settings (such as Poisson Regression) can actually be much superior to ERM. We demonstrate empirically that our method instantiated with a well-designed general purpose mixture likelihood family can obtain superior performance over ERM for a variety of tasks across time-series forecasting and regression datasets with different data distributions.

Viaarxiv icon

Hierarchically Regularized Deep Forecasting

Jun 14, 2021
Biswajit Paria, Rajat Sen, Amr Ahmed, Abhimanyu Das

Figure 1 for Hierarchically Regularized Deep Forecasting
Figure 2 for Hierarchically Regularized Deep Forecasting
Figure 3 for Hierarchically Regularized Deep Forecasting
Figure 4 for Hierarchically Regularized Deep Forecasting

Hierarchical forecasting is a key problem in many practical multivariate forecasting applications - the goal is to simultaneously predict a large number of correlated time series that are arranged in a pre-specified aggregation hierarchy. The challenge is to exploit the hierarchical correlations to simultaneously obtain good prediction accuracy for time series at different levels of the hierarchy. In this paper, we propose a new approach for hierarchical forecasting based on decomposing the time series along a global set of basis time series and modeling hierarchical constraints using the coefficients of the basis decomposition for each time series. Unlike past methods, our approach is scalable at inference-time (forecasting for a specific time series only needs access to its own data) while (approximately) preserving coherence among the time series forecasts. We experiment on several publicly available datasets and demonstrate significantly improved overall performance on forecasts at different levels of the hierarchy, compared to existing state-of-the-art hierarchical reconciliation methods.

Viaarxiv icon