We propose a first data-driven tuning procedure for divide-and-conquer kernel ridge regression (Zhang et al., 2015). While the proposed criterion is computationally scalable for massive data sets, it is also shown to be asymptotically optimal under mild conditions. The effectiveness of our method is illustrated by extensive simulations and an application to Million Song Dataset.
We consider the estimation and inference of sparse graphical models that characterize the dependency structure of high-dimensional tensor-valued data. To facilitate the estimation of the precision matrix corresponding to each way of the tensor, we assume the data follow a tensor normal distribution whose covariance has a Kronecker product structure. A critical challenge in the estimation and inference of this model is the fact that its penalized maximum likelihood estimation involves minimizing a non-convex objective function. To address it, this paper makes two contributions: (i) In spite of the non-convexity of this estimation problem, we prove that an alternating minimization algorithm, which iteratively estimates each sparse precision matrix while fixing the others, attains an estimator with the optimal statistical rate of convergence. Notably, such an estimator achieves estimation consistency with only one tensor sample, which was not observed in the previous work. (ii) We propose a de-biased statistical inference procedure for testing hypotheses on the true support of the sparse precision matrices, and employ it for testing a growing number of hypothesis with false discovery rate (FDR) control. The asymptotic normality of our test statistic and the consistency of FDR control procedure are established. Our theoretical results are further backed up by thorough numerical studies. We implement the methods into a publicly available R package Tlasso.
We propose a novel sparse tensor decomposition method, namely Tensor Truncated Power (TTP) method, that incorporates variable selection into the estimation of decomposition components. The sparsity is achieved via an efficient truncation step embedded in the tensor power iteration. Our method applies to a broad family of high dimensional latent variable models, including high dimensional Gaussian mixture and mixtures of sparse regressions. A thorough theoretical investigation is further conducted. In particular, we show that the final decomposition estimator is guaranteed to achieve a local statistical rate, and further strengthen it to the global statistical rate by introducing a proper initialization procedure. In high dimensional regimes, the obtained statistical rate significantly improves those shown in the existing non-sparse decomposition methods. The empirical advantages of TTP are confirmed in extensive simulated results and two real applications of click-through rate prediction and high-dimensional gene clustering.
The stability of statistical analysis is an important indicator for reproducibility, which is one main principle of scientific method. It entails that similar statistical conclusions can be reached based on independent samples from the same underlying population. In this paper, we introduce a general measure of classification instability (CIS) to quantify the sampling variability of the prediction made by a classification method. Interestingly, the asymptotic CIS of any weighted nearest neighbor classifier turns out to be proportional to the Euclidean norm of its weight vector. Based on this concise form, we propose a stabilized nearest neighbor (SNN) classifier, which distinguishes itself from other nearest neighbor classifiers, by taking the stability into consideration. In theory, we prove that SNN attains the minimax optimal convergence rate in risk, and a sharp convergence rate in CIS. The latter rate result is established for general plug-in classifiers under a low-noise condition. Extensive simulated and real examples demonstrate that SNN achieves a considerable improvement in CIS over existing nearest neighbor classifiers, with comparable classification accuracy. We implement the algorithm in a publicly available R package snn.
This article studies local and global inference for smoothing spline estimation in a unified asymptotic framework. We first introduce a new technical tool called functional Bahadur representation, which significantly generalizes the traditional Bahadur representation in parametric models, that is, Bahadur [Ann. Inst. Statist. Math. 37 (1966) 577-580]. Equipped with this tool, we develop four interconnected procedures for inference: (i) pointwise confidence interval; (ii) local likelihood ratio testing; (iii) simultaneous confidence band; (iv) global likelihood ratio testing. In particular, our confidence intervals are proved to be asymptotically valid at any point in the support, and they are shorter on average than the Bayesian confidence intervals proposed by Wahba [J. R. Stat. Soc. Ser. B Stat. Methodol. 45 (1983) 133-150] and Nychka [J. Amer. Statist. Assoc. 83 (1988) 1134-1143]. We also discuss a version of the Wilks phenomenon arising from local/global likelihood ratio testing. It is also worth noting that our simultaneous confidence bands are the first ones applicable to general quasi-likelihood models. Furthermore, issues relating to optimality and efficiency are carefully addressed. As a by-product, we discover a surprising relationship between periodic and nonperiodic smoothing splines in terms of inference.