Abstract:The Lasso is a prominent algorithm for variable selection. However, its instability in the presence of correlated variables in the high-dimensional setting is well-documented. Although previous research has attempted to address this issue by modifying the Lasso loss function, this paper introduces an approach that simplifies the data processed by Lasso. We propose that decorrelating variables before applying the Lasso improves the stability of variable selection regardless of the direction of correlation among predictors. Furthermore, we highlight that the irrepresentable condition, which ensures consistency for the Lasso, is satisfied after variable decorrelation under two assumptions. In addition, by noting that the instability of the Lasso is not limited to high-dimensional settings, we demonstrate the effectiveness of the proposed approach for low-dimensional data. Finally, we present empirical results that indicate the efficacy of the proposed method across different variable selection techniques, highlighting its potential for broader application. The DVS R package is developed to facilitate the implementation of the methodology proposed in this paper.
Abstract:Portfolio optimization involves selecting asset weights to minimize a risk-reward objective, such as the portfolio variance in the classical minimum-variance framework. Sparse portfolio selection extends this by imposing a cardinality constraint: only $k$ assets from a universe of $p$ may be included. The standard approach models this problem as a mixed-integer quadratic program and relies on commercial solvers to find the optimal solution. However, the computational costs of such methods increase exponentially with $k$ and $p$, making them too slow for problems of even moderate size. We propose a fast and scalable gradient-based approach that transforms the combinatorial sparse selection problem into a constrained continuous optimization task via Boolean relaxation, while preserving equivalence with the original problem on the set of binary points. Our algorithm employs a tunable parameter that transmutes the auxiliary objective from a convex to a concave function. This allows a stable convex starting point, followed by a controlled path toward a sparse binary solution as the tuning parameter increases and the objective moves toward concavity. In practice, our method matches commercial solvers in asset selection for most instances and, in rare instances, the solution differs by a few assets whilst showing a negligible error in portfolio variance.
Abstract:Stability selection is a widely adopted resampling-based framework for high-dimensional structure estimation and variable selection. However, the concept of 'stability' is often narrowly addressed, primarily through examining selection frequencies, or 'stability paths'. This paper seeks to broaden the use of an established stability estimator to evaluate the overall stability of the stability selection framework, moving beyond single-variable analysis. We suggest that the stability estimator offers two advantages: it can serve as a reference to reflect the robustness of the outcomes obtained and help identify an optimal regularization value to improve stability. By determining this value, we aim to calibrate key stability selection parameters, namely, the decision threshold and the expected number of falsely selected variables, within established theoretical bounds. Furthermore, we explore a novel selection criterion based on this regularization value. With the asymptotic distribution of the stability estimator previously established, convergence to true stability is ensured, allowing us to observe stability trends over successive sub-samples. This approach sheds light on the required number of sub-samples addressing a notable gap in prior studies. The 'stabplot' package is developed to facilitate the use of the plots featured in this manuscript, supporting their integration into further statistical analysis and research workflows.