Abstract:We introduce Metric-Aware Principal Component Analysis (MAPCA), a unified framework for scale-invariant representation learning based on the generalised eigenproblem max Tr(W^T Sigma W) subject to W^T M W = I, where M is a symmetric positive definite metric matrix. The choice of M determines the representation geometry. The canonical beta-family M(beta) = Sigma^beta, beta in [0,1], provides continuous spectral bias control between standard PCA (beta=0) and output whitening (beta=1), with condition number kappa(beta) = (lambda_1/lambda_p)^(1-beta) decreasing monotonically to isotropy. The diagonal metric M = D = diag(Sigma) recovers Invariant PCA (IPCA), a method rooted in Frisch (1928) diagonal regression, as a distinct member of the broader framework. We prove that scale invariance holds if and only if the metric transforms as M_tilde = CMC under rescaling C, a condition satisfied exactly by IPCA but not by the general beta-family at intermediate values. Beyond its classical interpretation, MAPCA provides a geometric language that unifies several self-supervised learning objectives. Barlow Twins and ZCA whitening correspond to beta=1 (output whitening); VICReg's variance term corresponds to the diagonal metric. A key finding is that W-MSE, despite being described as a whitening-based method, corresponds to M = Sigma^{-1} (beta = -1), outside the spectral compression range entirely and in the opposite spectral direction to Barlow Twins. This distinction between input and output whitening is invisible at the level of loss functions and becomes precise only within the MAPCA framework.
Abstract:The Expected Calibration Error (ece), the dominant calibration metric in machine learning, compares predicted probabilities against empirical frequencies of binary outcomes. This is appropriate when labels are binary events. However, many modern settings produce labels that are themselves probabilities rather than binary outcomes: a radiologist's stated confidence, a teacher model's soft output in knowledge distillation, a class posterior derived from a generative model, or an annotator agreement fraction. In these settings, ece commits a category error - it discards the probabilistic information in the label by forcing it into a binary comparison. The result is not a noisy approximation that more data will correct. It is a structural misalignment that persists and converges to the wrong answer with increasing precision as sample size grows. We introduce the Soft Mean Expected Calibration Error (smece), a calibration metric for settings where labels are of probabilistic nature. The modification to the ece formula is one line: replace the empirical hard-label fraction in each prediction bin with the mean probability label of the samples in that bin. smece reduces exactly to ece when labels are binary, making it a strict generalisation.
Abstract:The Markov Transition Field (MTF), introduced by Wang and Oates (2015), encodes a time series as a two-dimensional image by mapping each pair of time steps to the transition probability between their quantile states, estimated from a single global transition matrix. This construction is efficient when the transition dynamics are stationary, but produces a misleading representation when the process changes regime over time: the global matrix averages across regimes and the resulting image loses all information about \emph{when} each dynamical regime was active. In this paper we introduce the \emph{Temporal Markov Transition Field} (TMTF), an extension that partitions the series into $K$ contiguous temporal chunks, estimates a separate local transition matrix for each chunk, and assembles the image so that each row reflects the dynamics local to its chunk rather than the global average. The resulting $T \times T$ image has $K$ horizontal bands of distinct texture, each encoding the transition dynamics of one temporal segment. We develop the formal definition, establish the key structural properties of the representation, work through a complete numerical example that makes the distinction from the global MTF concrete, analyse the bias--variance trade-off introduced by temporal chunking, and discuss the geometric interpretation of the local transition matrices in terms of process properties such as persistence, mean reversion, and trending behaviour. The TMTF is amplitude-agnostic and order-preserving, making it suitable as an input channel for convolutional neural networks applied to time series characterisation tasks.