Large pre-trained sequence models, such as transformers, excel as few-shot learners capable of in-context learning (ICL). In ICL, a model is trained to adapt its operation to a new task based on limited contextual information, typically in the form of a few training examples for the given task. Previous work has explored the use of ICL for channel equalization in single-user multi-input and multiple-output (MIMO) systems. In this work, we demonstrate that ICL can be also used to tackle the problem of multi-user equalization in cell-free MIMO systems with limited fronthaul capacity. In this scenario, a task is defined by channel statistics, signal-to-noise ratio, and modulation schemes. The context encompasses the users' pilot sequences, the corresponding quantized received signals, and the current received data signal. Different prompt design strategies are proposed and evaluated that encompass also large-scale fading and modulation information. Experiments demonstrate that ICL-based equalization provides estimates with lower mean squared error as compared to the linear minimum mean squared error equalizer, especially in the presence of limited fronthaul capacity and pilot contamination.
The safe integration of machine learning modules in decision-making processes hinges on their ability to quantify uncertainty. A popular technique to achieve this goal is conformal prediction (CP), which transforms an arbitrary base predictor into a set predictor with coverage guarantees. While CP certifies the predicted set to contain the target quantity with a user-defined tolerance, it does not provide control over the average size of the predicted sets, i.e., over the informativeness of the prediction. In this work, a theoretical connection is established between the generalization properties of the base predictor and the informativeness of the resulting CP prediction sets. To this end, an upper bound is derived on the expected size of the CP set predictor that builds on generalization error bounds for the base predictor. The derived upper bound provides insights into the dependence of the average size of the CP set predictor on the amount of calibration data, the target reliability, and the generalization performance of the base predictor. The theoretical insights are validated using simple numerical regression and classification tasks.
Large pre-trained sequence models, such as transformer-based architectures, have been recently shown to have the capacity to carry out in-context learning (ICL). In ICL, a decision on a new input is made via a direct mapping of the input and of a few examples from the given task, serving as the task's context, to the output variable. No explicit updates of model parameters are needed to tailor the decision to a new task. Pre-training, which amounts to a form of meta-learning, is based on the observation of examples from several related tasks. Prior work has shown ICL capabilities for linear regression. In this study, we leverage ICL to address the inverse problem of multiple-input and multiple-output (MIMO) equalization based on a context given by pilot symbols. A task is defined by the unknown fading channel and by the signal-to-noise ratio (SNR) level, which may be known. To highlight the practical potential of the approach, we allow for the presence of quantization of the received signals. We demonstrate via numerical results that transformer-based ICL has a threshold behavior, whereby, as the number of pre-training tasks grows, the performance switches from that of a minimum mean squared error (MMSE) equalizer with a prior determined by the pre-trained tasks to that of an MMSE equalizer with the true data-generating prior.
In many real-world problems, predictions are leveraged to monitor and control cyber-physical systems, demanding guarantees on the satisfaction of reliability and safety requirements. However, predictions are inherently uncertain, and managing prediction uncertainty presents significant challenges in environments characterized by complex dynamics and forking trajectories. In this work, we assume access to a pre-designed probabilistic implicit or explicit sequence model, which may have been obtained using model-based or model-free methods. We introduce probabilistic time series-conformal risk prediction (PTS-CRC), a novel post-hoc calibration procedure that operates on the predictions produced by any pre-designed probabilistic forecaster to yield reliable error bars. In contrast to existing art, PTS-CRC produces predictive sets based on an ensemble of multiple prototype trajectories sampled from the sequence model, supporting the efficient representation of forking uncertainties. Furthermore, unlike the state of the art, PTS-CRC can satisfy reliability definitions beyond coverage. This property is leveraged to devise a novel model predictive control (MPC) framework that addresses open-loop and closed-loop control problems under general average constraints on the quality or safety of the control policy. We experimentally validate the performance of PTS-CRC prediction and control by studying a number of use cases in the context of wireless networking. Across all the considered tasks, PTS-CRC predictors are shown to provide more informative predictive sets, as well as safe control policies with larger returns.
Consider a setting in which devices and a server share a pre-trained model. The server wishes to make an inference on a new input given the model. Devices have access to data, previously not used for training, and can communicate to the server over a common wireless channel. If the devices have no access to the new input, can communication from devices to the server enhance the quality of the inference decision at the server? Recent work has introduced federated conformal prediction (CP), which leverages devices-to-server communication to improve the reliability of the server's decision. With federated CP, devices communicate to the server information about the loss accrued by the shared pre-trained model on the local data, and the server leverages this information to calibrate a decision interval, or set, so that it is guaranteed to contain the correct answer with a pre-defined target reliability level. Previous work assumed noise-free communication, whereby devices can communicate a single real number to the server. In this paper, we study for the first time federated CP in a wireless setting. We introduce a novel protocol, termed wireless federated conformal prediction (WFCP), which builds on type-based multiple access (TBMA) and on a novel quantile correction strategy. WFCP is proved to provide formal reliability guarantees in terms of coverage of the predicted set produced by the server. Using numerical results, we demonstrate the significant advantages of WFCP against digital implementations of existing federated CP schemes, especially in regimes with limited communication resources and/or large number of devices.
Statistical heterogeneity across clients in a Federated Learning (FL) system increases the algorithm convergence time and reduces the generalization performance, resulting in a large communication overhead in return for a poor model. To tackle the above problems without violating the privacy constraints that FL imposes, personalized FL methods have to couple statistically similar clients without directly accessing their data in order to guarantee a privacy-preserving transfer. In this work, we design user-centric aggregation rules at the parameter server (PS) that are based on readily available gradient information and are capable of producing personalized models for each FL client. The proposed aggregation rules are inspired by an upper bound of the weighted aggregate empirical risk minimizer. Secondly, we derive a communication-efficient variant based on user clustering which greatly enhances its applicability to communication-constrained systems. Our algorithm outperforms popular personalized FL baselines in terms of average accuracy, worst node performance, and training communication overhead.
This paper investigates when the importance weighting (IW) correction is needed to address covariate shift, a common situation in supervised learning where the input distributions of training and test data differ. Classic results show that the IW correction is needed when the model is parametric and misspecified. In contrast, recent results indicate that the IW correction may not be necessary when the model is nonparametric and well-specified. We examine the missing case in the literature where the model is nonparametric and misspecified, and show that the IW correction is needed for obtaining the best approximation of the true unknown function for the test distribution. We do this by analyzing IW-corrected kernel ridge regression, covering a variety of settings, including parametric and nonparametric models, well-specified and misspecified settings, and arbitrary weighting functions.
This work takes a critical look at the application of conventional machine learning methods to wireless communication problems through the lens of reliability and robustness. Deep learning techniques adopt a frequentist framework, and are known to provide poorly calibrated decisions that do not reproduce the true uncertainty caused by limitations in the size of the training data. Bayesian learning, while in principle capable of addressing this shortcoming, is in practice impaired by model misspecification and by the presence of outliers. Both problems are pervasive in wireless communication settings, in which the capacity of machine learning models is subject to resource constraints and training data is affected by noise and interference. In this context, we explore the application of the framework of robust Bayesian learning. After a tutorial-style introduction to robust Bayesian learning, we showcase the merits of robust Bayesian learning on several important wireless communication problems in terms of accuracy, calibration, and robustness to outliers and misspecification.
Decentralized learning algorithms empower interconnected edge devices to share data and computational resources to collaboratively train a machine learning model without the aid of a central coordinator (e.g. an orchestrating basestation). In the case of heterogeneous data distributions at the network devices, collaboration can yield predictors with unsatisfactory performance for a subset of the devices. For this reason, in this work we consider the formulation of a distributionally robust decentralized learning task and we propose a decentralized single loop gradient descent/ascent algorithm (AD-GDA) to solve the underlying minimax optimization problem. We render our algorithm communication efficient by employing a compressed consensus scheme and we provide convergence guarantees for smooth convex and non-convex loss functions. Finally, we corroborate the theoretical findings with empirical evidence of the ability of the proposed algorithm in providing unbiased predictors over a network of collaborating devices with highly heterogeneous data distributions.
Standard Bayesian learning is known to have suboptimal generalization capabilities under model misspecification and in the presence of outliers. PAC-Bayes theory demonstrates that the free energy criterion minimized by Bayesian learning is a bound on the generalization error for Gibbs predictors (i.e., for single models drawn at random from the posterior) under the assumption of sampling distributions uncontaminated by outliers. This viewpoint provides a justification for the limitations of Bayesian learning when the model is misspecified, requiring ensembling, and when data is affected by outliers. In recent work, PAC-Bayes bounds - referred to as PAC$^m$ - were derived to introduce free energy metrics that account for the performance of ensemble predictors, obtaining enhanced performance under misspecification. This work presents a novel robust free energy criterion that combines the generalized logarithm score function with PAC$^m$ ensemble bounds. The proposed free energy training criterion produces predictive distributions that are able to concurrently counteract the detrimental effects of model misspecification and outliers.