Wireless federated learning (FL) is an emerging machine learning paradigm that trains a global parametric model from distributed datasets via wireless communications. This paper proposes a unit-modulus wireless FL (UMWFL) framework, which simultaneously uploads local model parameters and computes global model parameters via optimized phase shifting. The proposed framework avoids sophisticated baseband signal processing, leading to both low communication delays and implementation costs. A training loss bound is derived and a penalty alternating minimization (PAM) algorithm is proposed to minimize the nonconvex nonsmooth loss bound. Experimental results in the Car Learning to Act (CARLA) platform show that the proposed UMWFL framework with PAM algorithm achieves smaller training losses and testing errors than those of the benchmark scheme.
Domain generalization aims to learn knowledge invariant across different distributions while semantically meaningful for downstream tasks from multiple source domains, to improve the model's generalization ability on unseen target domains. The fundamental objective is to understand the underlying "invariance" behind these observational distributions and such invariance has been shown to have a close connection to causality. While many existing approaches make use of the property that causal features are invariant across domains, we consider the causal invariance of the average causal effect of the features to the labels. This invariance regularizes our training approach in which interventions are performed on features to enforce stability of the causal prediction by the classifier across domains. Our work thus sheds some light on the domain generalization problem by introducing invariance of the mechanisms into the learning process. Experiments on several benchmark datasets demonstrate the performance of the proposed method against SOTAs.
Edge federated learning (FL) is an emerging machine learning paradigm that trains a global parametric model from distributed datasets via wireless communications. This paper proposes a unit-modulus over-the-air computation (UM-AirComp) framework to facilitate efficient edge federated learning, which simultaneously uploads local model parameters and updates global model parameters via analog beamforming. The proposed framework avoids sophisticated baseband signal processing, leading to low communication delays and implementation costs. A training loss bound of UM-AirComp is derived and two low-complexity algorithms, termed penalty alternating minimization (PAM) and accelerated gradient projection (AGP), are proposed to minimize the nonconvex nonsmooth loss bound. Simulation results show that the proposed UM-AirComp framework with PAM algorithm not only achieves a smaller mean square error of model parameters' estimation, training loss, and testing error, but also requires a significantly shorter runtime than that of other benchmark schemes. Moreover, the proposed UM-AirComp framework with AGP algorithm achieves satisfactory performance while reduces the computational complexity by orders of magnitude compared with existing optimization algorithms. Finally, we demonstrate the implementation of UM-AirComp in a vehicle-to-everything autonomous driving simulation platform. It is found that autonomous driving tasks are more sensitive to model parameter errors than other tasks since the former neural networks are more sophisticated containing sparser model parameters.
Edge learning (EL), which uses edge computing as a platform to execute machine learning algorithms, is able to fully exploit the massive sensing data generated by Internet of Things (IoT). However, due to the limited transmit power at IoT devices, collecting the sensing data in EL systems is a challenging task. To address this challenge, this paper proposes to integrate unmanned ground vehicle (UGV) with EL. With such a scheme, the UGV could improve the communication quality by approaching various IoT devices. However, different devices may transmit different data for different machine learning jobs and a fundamental question is how to jointly plan the UGV path, the devices' energy consumption, and the number of samples for different jobs? This paper further proposes a graph-based path planning model, a network energy consumption model and a sample size planning model that characterizes F-measure as a function of the minority class sample size. With these models, the joint path, energy and sample size planning (JPESP) problem is formulated as a large-scale mixed integer nonlinear programming (MINLP) problem, which is nontrivial to solve due to the high-dimensional discontinuous variables related to UGV movement. To this end, it is proved that each IoT device should be served only once along the path, thus the problem dimension is significantly reduced. Furthermore, to handle the discontinuous variables, a tabu search (TS) based algorithm is derived, which converges in expectation to the optimal solution to the JPESP problem. Simulation results under different task scenarios show that our optimization schemes outperform the fixed EL and the full path EL schemes.
Tensor rank learning for canonical polyadic decomposition (CPD) has long been deemed as an essential but challenging problem. In particular, since the tensor rank controls the complexity of the CPD model, its inaccurate learning would cause overfitting to noise or underfitting to the signal sources, and even destroy the interpretability of model parameters. However, the optimal determination of a tensor rank is known to be a non-deterministic polynomial-time hard (NP-hard) task. Rather than exhaustively searching for the best tensor rank via trial-and-error experiments, Bayesian inference under the Gaussian-gamma prior was introduced in the context of probabilistic CPD modeling and it was shown to be an effective strategy for automatic tensor rank determination. This triggered flourishing research on other structured tensor CPDs with automatic tensor rank learning. As the other side of the coin, these research works also reveal that the Gaussian-gamma model does not perform well for high-rank tensors or/and low signal-to-noise ratios (SNRs). To overcome these drawbacks, in this paper, we introduce a more advanced generalized hyperbolic (GH) prior to the probabilistic CPD model, which not only includes the Gaussian-gamma model as a special case, but also provides more flexibilities to adapt to different levels of sparsity. Based on this novel probabilistic model, an algorithm is developed under the framework of variational inference, where each update is obtained in a closed-form. Extensive numerical results, using synthetic data and real-world datasets, demonstrate the excellent performance of the proposed method in learning both low as well as high tensor ranks even for low SNR cases.
While machine-type communication (MTC) devices generate massive data, they often cannot process this data due to limited energy and computation power. To this end, edge intelligence has been proposed, which collects distributed data and performs machine learning at the edge. However, this paradigm needs to maximize the learning performance instead of the communication throughput, for which the celebrated water-filling and max-min fairness algorithms become inefficient since they allocate resources merely according to the quality of wireless channels. This paper proposes a learning centric power allocation (LCPA) method, which allocates radio resources based on an empirical classification error model. To get insights into LCPA, an asymptotic optimal solution is derived. The solution shows that the transmit powers are inversely proportional to the channel gain, and scale exponentially with the learning parameters. Experimental results show that the proposed LCPA algorithm significantly outperforms other power allocation algorithms.
While machine-type communication (MTC) devices generate considerable amounts of data, they often cannot process the data due to limited energy and computation power. To empower MTC with intelligence, edge machine learning has been proposed. However, power allocation in this paradigm requires maximizing the learning performance instead of the communication throughput, for which the celebrated water-filling and max-min fairness algorithms become inefficient. To this end, this paper proposes learning centric power allocation (LCPA), which provides a new perspective to radio resource allocation in learning driven scenarios. By employing an empirical classification error model that is supported by learning theory, the LCPA is formulated as a nonconvex nonsmooth optimization problem, and is solved by majorization minimization (MM) framework. To get deeper insights into LCPA, asymptotic analysis shows that the transmit powers are inversely proportional to the channel gain, and scale exponentially with the learning parameters. This is in contrast to traditional power allocations where quality of wireless channels is the only consideration. Last but not least, to enable LCPA in large-scale settings, two optimization algorithms, termed mirror-prox LCPA and accelerated LCPA, are further proposed. Extensive numerical results demonstrate that the proposed LCPA algorithms outperform traditional power allocation algorithms, and the large-scale algorithms reduce the computation time by orders of magnitude compared with MM-based LCPA but still achieve competing learning performance.
This paper considers inference over distributed linear Gaussian models using factor graphs and Gaussian belief propagation (BP). The distributed inference algorithm involves only local computation of the information matrix and of the mean vector, and message passing between neighbors. Under broad conditions, it is shown that the message information matrix converges to a unique positive definite limit matrix for arbitrary positive semidefinite initialization, and it approaches an arbitrarily small neighborhood of this limit matrix at a doubly exponential rate. A necessary and sufficient convergence condition for the belief mean vector to converge to the optimal centralized estimator is provided under the assumption that the message information matrix is initialized as a positive semidefinite matrix. Further, it is shown that Gaussian BP always converges when the underlying factor graph is given by the union of a forest and a single loop. The proposed convergence condition in the setup of distributed linear Gaussian models is shown to be strictly weaker than other existing convergence conditions and requirements, including the Gaussian Markov random field based walk-summability condition, and applicable to a large class of scenarios.
Gaussian belief propagation (BP) has been widely used for distributed inference in large-scale networks such as the smart grid, sensor networks, and social networks, where local measurements/observations are scattered over a wide geographical area. One particular case is when two neighboring agents share a common observation. For example, to estimate voltage in the direct current (DC) power flow model, the current measurement over a power line is proportional to the voltage difference between two neighboring buses. When applying the Gaussian BP algorithm to this type of problem, the convergence condition remains an open issue. In this paper, we analyze the convergence properties of Gaussian BP for this pairwise linear Gaussian model. We show analytically that the updating information matrix converges at a geometric rate to a unique positive definite matrix with arbitrary positive semidefinite initial value and further provide the necessary and sufficient convergence condition for the belief mean vector to the optimal estimate.