Algorithmic machine teaching studies the interaction between a teacher and a learner where the teacher selects labeled examples aiming at teaching a target hypothesis. In a quest to lower teaching complexity, several teaching models and complexity measures have been proposed for both the batch settings (e.g., worst-case, recursive, preference-based, and non-clashing models) and the sequential settings (e.g., local preference-based model). To better understand the connections between these models, we develop a novel framework that captures the teaching process via preference functions $\Sigma$. In our framework, each function $\sigma \in \Sigma$ induces a teacher-learner pair with teaching complexity as $TD(\sigma)$. We show that the above-mentioned teaching models are equivalent to specific types/families of preference functions. We analyze several properties of the teaching complexity parameter $TD(\sigma)$ associated with different families of the preference functions, e.g., comparison to the VC dimension of the hypothesis class and additivity/sub-additivity of $TD(\sigma)$ over disjoint domains. Finally, we identify preference functions inducing a novel family of sequential models with teaching complexity linear in the VC dimension: this is in contrast to the best-known complexity result for the batch models, which is quadratic in the VC dimension.
The fundamental principle in Modern Portfolio Theory (MPT) is based on the quantification of the portfolio's risk related to performance. Although MPT has made huge impacts on the investment world and prompted the success and prevalence of passive investing, it still has shortcomings in real-world applications. One of the main challenges is that the level of risk an investor can endure, known as \emph{risk-preference}, is a subjective choice that is tightly related to psychology and behavioral science in decision making. This paper presents a novel approach of measuring risk preference from existing portfolios using inverse optimization on the mean-variance portfolio allocation framework. Our approach allows the learner to continuously estimate real-time risk preferences using concurrent observed portfolios and market price data. We demonstrate our methods on real market data that consists of 20 years of asset pricing and 10 years of mutual fund portfolio holdings. Moreover, the quantified risk preference parameters are validated with two well-known risk measurements currently applied in the field. The proposed methods could lead to practical and fruitful innovations in automated/personalized portfolio management, such as Robo-advising, to augment financial advisors' decision intelligence in a long-term investment horizon.
We study the problem of learning mixtures of low-rank models, i.e. reconstructing multiple low-rank matrices from unlabelled linear measurements of each. This problem enriches two widely studied settings -- low-rank matrix sensing and mixed linear regression -- by bringing latent variables (i.e. unknown labels) and structural priors (i.e. low-rank structures) into consideration. To cope with the non-convexity issues arising from unlabelled heterogeneous data and low-complexity structure, we develop a three-stage meta-algorithm that is guaranteed to recover the unknown matrices with near-optimal sample and computational complexities under Gaussian designs. In addition, the proposed algorithm is provably stable against random noise. We complement the theoretical studies with empirical evidence that confirms the efficacy of our algorithm.
Incipient anomalies present milder symptoms compared to severe ones, and are more difficult to detect and diagnose due to their close resemblance to normal operating conditions. The lack of incipient anomaly examples in the training data can pose severe risks to anomaly detection methods that are built upon Machine Learning (ML) techniques, because these anomalies can be easily mistaken as normal operating conditions. To address this challenge, we propose to utilize the uncertainty information available from ensemble learning to identify potential misclassified incipient anomalies. We show in this paper that ensemble learning methods can give improved performance on incipient anomalies and identify common pitfalls in these models through extensive experiments on two real-world datasets. Then, we discuss how to design more effective ensemble models for detecting incipient anomalies.
Natural policy gradient (NPG) methods are among the most widely used policy optimization algorithms in contemporary reinforcement learning. This class of methods is often applied in conjunction with entropy regularization -- an algorithmic scheme that helps encourage exploration -- and is closely related to soft policy iteration and trust region policy optimization. Despite the empirical success, the theoretical underpinnings for NPG methods remain severely limited even for the tabular setting. This paper develops $\textit{non-asymptotic}$ convergence guarantees for entropy-regularized NPG methods under softmax parameterization, focusing on discounted Markov decision processes (MDPs). Assuming access to exact policy evaluation, we demonstrate that the algorithm converges linearly -- or even quadratically once it enters a local region around the optimal policy -- when computing optimal value functions of the regularized MDP. Moreover, the algorithm is provably stable vis-\`a-vis inexactness of policy evaluation, and is able to find an $\epsilon$-optimal policy for the original MDP when applied to a slightly perturbed MDP. Our convergence results outperform the ones established for unregularized NPG methods (arXiv:1908.00261), and shed light upon the role of entropy regularization in accelerating convergence.
We investigate the effectiveness of convex relaxation and nonconvex optimization in solving bilinear systems of equations (a.k.a. blind deconvolution under a subspace model). Despite the wide applicability, the theoretical understanding about these two paradigms remains largely inadequate in the presence of noise. The current paper makes two contributions by demonstrating that: (1) convex relaxation achieves minimax-optimal statistical accuracy vis-\`a-vis random noise, and (2) a two-stage nonconvex algorithm attains minimax-optimal accuracy within a logarithmic number of iterations. Both results improve upon the state-of-the-art results by some factors that scale polynomially in the problem dimension.
Ensemble learning is widely applied in Machine Learning (ML) to improve model performance and to mitigate decision risks. In this approach, predictions from a diverse set of learners are combined to obtain a joint decision. Recently, various methods have been explored in literature for estimating decision uncertainties using ensemble learning; however, determining which metrics are a better fit for certain decision-making applications remains a challenging task. In this paper, we study the following key research question in the selection of uncertainty metrics: when does an uncertainty metric outperforms another? We answer this question via a rigorous analysis of two commonly used uncertainty metrics in ensemble learning, namely ensemble mean and ensemble variance. We show that, under mild assumptions on the ensemble learners, ensemble mean is preferable with respect to ensemble variance as an uncertainty metric for decision making. We empirically validate our assumptions and theoretical results via an extensive case study: the diagnosis of referable diabetic retinopathy.
Intermediate-Severity (IS) faults present milder symptoms compared to severe faults, and are more difficult to detect and diagnose due to their close resemblance to normal operating conditions. The lack of IS fault examples in the training data can pose severe risks to Fault Detection and Diagnosis (FDD) methods that are built upon Machine Learning (ML) techniques, because these faults can be easily mistaken as normal operating conditions. Ensemble models are widely applied in ML and are considered promising methods for detecting out-of-distribution (OOD) data. We identify common pitfalls in these models through extensive experiments with several popular ensemble models on two real-world datasets. Then, we discuss how to design more effective ensemble models for detecting and diagnosing IS faults.