Abstract:We introduce an algorithm that learns correlations between two datasets, in a way which can be used to infer one type of data given the other. The approach allows for the computation of expectation values over the inferred conditional distributions, such as Bayesian estimators and their standard deviations. This is done by learning feature maps which span hyperplanes in the spaces of probabilities for both types of data, optimized to optimally represent correlations. When applied to supervised learning, this yields a new objective function which automatically provides regularization and results in faster convergence. We propose that, in addition to many applications where two correlated variables appear naturally, this approach could also be used to identify dominant independent features of a single dataset in an unsupervised fashion: in this scenario, the second variables should be produced from the original data by adding noise in a manner which defines an appropriate information metric.
Abstract:In many-body physics, renormalization techniques are used to extract aspects of a statistical or quantum state that are relevant at large scale, or for low energy experiments. Recent works have proposed that these features can be formally identified as those perturbations of the states whose distinguishability most resist coarse-graining. Here, we examine whether this same strategy can be used to identify important features of an unlabeled dataset. This approach indeed results in a technique very similar to kernel PCA (principal component analysis), but with a kernel function that is automatically adapted to the data, or "learned". We test this approach on handwritten digits, and find that the most relevant features are significantly better for classification than those obtained from a simple gaussian kernel.