Machine learning offers novel ways and means to design personalized learning systems wherein each student's educational experience is customized in real time depending on their background, learning goals, and performance to date. SPARse Factor Analysis (SPARFA) is a novel framework for machine learning-based learning analytics, which estimates a learner's knowledge of the concepts underlying a domain, and content analytics, which estimates the relationships among a collection of questions and those concepts. SPARFA jointly learns the associations among the questions and the concepts, learner concept knowledge profiles, and the underlying question difficulties, solely based on the correct/incorrect graded responses of a population of learners to a collection of questions. In this paper, we extend the SPARFA framework significantly to enable: (i) the analysis of graded responses on an ordinal scale (partial credit) rather than a binary scale (correct/incorrect); (ii) the exploitation of tags/labels for questions that partially describe the question{concept associations. The resulting Ordinal SPARFA-Tag framework greatly enhances the interpretability of the estimated concepts. We demonstrate using real educational data that Ordinal SPARFA-Tag outperforms both SPARFA and existing collaborative filtering techniques in predicting missing learner responses.
In this paper, we introduce the concept of sparse bilinear logistic regression for decision problems involving explanatory variables that are two-dimensional matrices. Such problems are common in computer vision, brain-computer interfaces, style/content factorization, and parallel factor analysis. The underlying optimization problem is bi-convex; we study its solution and develop an efficient algorithm based on block coordinate descent. We provide a theoretical guarantee for global convergence and estimate the asymptotical convergence rate using the Kurdyka-{\L}ojasiewicz inequality. A range of experiments with simulated and real data demonstrate that sparse bilinear logistic regression outperforms current techniques in several important applications.
This paper studies graphical model selection, i.e., the problem of estimating a graph of statistical relationships among a collection of random variables. Conventional graphical model selection algorithms are passive, i.e., they require all the measurements to have been collected before processing begins. We propose an active learning algorithm that uses junction tree representations to adapt future measurements based on the information gathered from prior measurements. We prove that, under certain conditions, our active learning algorithm requires fewer scalar measurements than any passive algorithm to reliably estimate a graph. A range of numerical results validate our theory and demonstrates the benefits of active learning.
In this paper, we address the challenging problem of selecting tuning parameters for high-dimensional sparse regression. We propose a simple and computationally efficient method, called path thresholding (PaTh), that transforms any tuning parameter-dependent sparse regression algorithm into an asymptotically tuning-free sparse regression algorithm. More specifically, we prove that, as the problem size becomes large (in the number of variables and in the number of observations), PaTh performs accurate sparse regression, under appropriate conditions, without specifying a tuning parameter. In finite-dimensional settings, we demonstrate that PaTh can alleviate the computational burden of model selection algorithms by significantly reducing the search space of tuning parameters.
We consider the high-dimensional sparse linear regression problem of accurately estimating a sparse vector using a small number of linear measurements that are contaminated by noise. It is well known that the standard cadre of computationally tractable sparse regression algorithms---such as the Lasso, Orthogonal Matching Pursuit (OMP), and their extensions---perform poorly when the measurement matrix contains highly correlated columns. To address this shortcoming, we develop a simple greedy algorithm, called SWAP, that iteratively swaps variables until convergence. SWAP is surprisingly effective in handling measurement matrices with high correlations. In fact, we prove that SWAP outputs the true support, the locations of the non-zero entries in the sparse vector, under a relatively mild condition on the measurement matrix. Furthermore, we show that SWAP can be used to boost the performance of any sparse regression algorithm. We empirically demonstrate the advantages of SWAP by comparing it with several state-of-the-art sparse regression algorithms.
We present a video compressive sensing framework, termed kt-CSLDS, to accelerate the image acquisition process of dynamic magnetic resonance imaging (MRI). We are inspired by a state-of-the-art model for video compressive sensing that utilizes a linear dynamical system (LDS) to model the motion manifold. Given compressive measurements, the state sequence of an LDS can be first estimated using system identification techniques. We then reconstruct the observation matrix using a joint structured sparsity assumption. In particular, we minimize an objective function with a mixture of wavelet sparsity and joint sparsity within the observation matrix. We derive an efficient convex optimization algorithm through alternating direction method of multipliers (ADMM), and provide a theoretical guarantee for global convergence. We demonstrate the performance of our approach for video compressive sensing, in terms of reconstruction accuracy. We also investigate the impact of various sampling strategies. We apply this framework to accelerate the acquisition process of dynamic MRI and show it achieves the best reconstruction accuracy with the least computational time compared with existing algorithms in the literature.
We propose SPARFA-Trace, a new machine learning-based framework for time-varying learning and content analytics for education applications. We develop a novel message passing-based, blind, approximate Kalman filter for sparse factor analysis (SPARFA), that jointly (i) traces learner concept knowledge over time, (ii) analyzes learner concept knowledge state transitions (induced by interacting with learning resources, such as textbook sections, lecture videos, etc, or the forgetting effect), and (iii) estimates the content organization and intrinsic difficulty of the assessment questions. These quantities are estimated solely from binary-valued (correct/incorrect) graded learner response data and a summary of the specific actions each learner performs (e.g., answering a question or studying a learning resource) at each time instance. Experimental results on two online course datasets demonstrate that SPARFA-Trace is capable of tracing each learner's concept knowledge evolution over time, as well as analyzing the quality and content organization of learning resources, the question-concept associations, and the question intrinsic difficulties. Moreover, we show that SPARFA-Trace achieves comparable or better performance in predicting unobserved learner responses than existing collaborative filtering and knowledge tracing approaches for personalized education.
Iterative thresholding algorithms are well-suited for high-dimensional problems in sparse recovery and compressive sensing. The performance of this class of algorithms depends heavily on the tuning of certain threshold parameters. In particular, both the final reconstruction error and the convergence rate of the algorithm crucially rely on how the threshold parameter is set at each step of the algorithm. In this paper, we propose a parameter-free approximate message passing (AMP) algorithm that sets the threshold parameter at each iteration in a fully automatic way without either having an information about the signal to be reconstructed or needing any tuning from the user. We show that the proposed method attains both the minimum reconstruction error and the highest convergence rate. Our method is based on applying the Stein unbiased risk estimate (SURE) along with a modified gradient descent to find the optimal threshold in each iteration. Motivated by the connections between AMP and LASSO, it could be employed to find the solution of the LASSO for the optimal regularization parameter. To the best of our knowledge, this is the first work concerning parameter tuning that obtains the fastest convergence rate with theoretical guarantees.
This paper concerns the performance of the LASSO (also knows as basis pursuit denoising) for recovering sparse signals from undersampled, randomized, noisy measurements. We consider the recovery of the signal $x_o \in \mathbb{R}^N$ from $n$ random and noisy linear observations $y= Ax_o + w$, where $A$ is the measurement matrix and $w$ is the noise. The LASSO estimate is given by the solution to the optimization problem $x_o$ with $\hat{x}_{\lambda} = \arg \min_x \frac{1}{2} \|y-Ax\|_2^2 + \lambda \|x\|_1$. Despite major progress in the theoretical analysis of the LASSO solution, little is known about its behavior as a function of the regularization parameter $\lambda$. In this paper we study two questions in the asymptotic setting (i.e., where $N \rightarrow \infty$, $n \rightarrow \infty$ while the ratio $n/N$ converges to a fixed number in $(0,1)$): (i) How does the size of the active set $\|\hat{x}_\lambda\|_0/N$ behave as a function of $\lambda$, and (ii) How does the mean square error $\|\hat{x}_{\lambda} - x_o\|_2^2/N$ behave as a function of $\lambda$? We then employ these results in a new, reliable algorithm for solving LASSO based on approximate message passing (AMP).
We develop a new model and algorithms for machine learning-based learning analytics, which estimate a learner's knowledge of the concepts underlying a domain, and content analytics, which estimate the relationships among a collection of questions and those concepts. Our model represents the probability that a learner provides the correct response to a question in terms of three factors: their understanding of a set of underlying concepts, the concepts involved in each question, and each question's intrinsic difficulty. We estimate these factors given the graded responses to a collection of questions. The underlying estimation problem is ill-posed in general, especially when only a subset of the questions are answered. The key observation that enables a well-posed solution is the fact that typical educational domains of interest involve only a small number of key concepts. Leveraging this observation, we develop both a bi-convex maximum-likelihood and a Bayesian solution to the resulting SPARse Factor Analysis (SPARFA) problem. We also incorporate user-defined tags on questions to facilitate the interpretability of the estimated factors. Experiments with synthetic and real-world data demonstrate the efficacy of our approach. Finally, we make a connection between SPARFA and noisy, binary-valued (1-bit) dictionary learning that is of independent interest.