In a modern power system, real-time data on power generation/consumption and its relevant features are stored in various distributed parties, including household meters, transformer stations and external organizations. To fully exploit the underlying patterns of these distributed data for accurate power prediction, federated learning is needed as a collaborative but privacy-preserving training scheme. However, current federated learning frameworks are polarized towards addressing either the horizontal or vertical separation of data, and tend to overlook the case where both are present. Furthermore, in mainstream horizontal federated learning frameworks, only artificial neural networks are employed to learn the data patterns, which are considered less accurate and interpretable compared to tree-based models on tabular datasets. To this end, we propose a hybrid federated learning framework based on XGBoost, for distributed power prediction from real-time external features. In addition to introducing boosted trees to improve accuracy and interpretability, we combine horizontal and vertical federated learning, to address the scenario where features are scattered in local heterogeneous parties and samples are scattered in various local districts. Moreover, we design a dynamic task allocation scheme such that each party gets a fair share of information, and the computing power of each party can be fully leveraged to boost training efficiency. A follow-up case study is presented to justify the necessity of adopting the proposed framework. The advantages of the proposed framework in fairness, efficiency and accuracy performance are also confirmed.
Rendering articulated objects while controlling their poses is critical to applications such as virtual reality or animation for movies. Manipulating the pose of an object, however, requires the understanding of its underlying structure, that is, its joints and how they interact with each other. Unfortunately, assuming the structure to be known, as existing methods do, precludes the ability to work on new object categories. We propose to learn both the appearance and the structure of previously unseen articulated objects by observing them move from multiple views, with no additional supervision, such as joints annotations, or information about the structure. Our insight is that adjacent parts that move relative to each other must be connected by a joint. To leverage this observation, we model the object parts in 3D as ellipsoids, which allows us to identify joints. We combine this explicit representation with an implicit one that compensates for the approximation introduced. We show that our method works for different structures, from quadrupeds, to single-arm robots, to humans.
Motion tracking systems based on optical sensors typically often suffer from issues, such as poor lighting conditions, occlusion, limited coverage, and may raise privacy concerns. More recently, radio frequency (RF)-based approaches using commercial WiFi devices have emerged which offer low-cost ubiquitous sensing whilst preserving privacy. However, the output of an RF sensing system, such as Range-Doppler spectrograms, cannot represent human motion intuitively and usually requires further processing. In this study, MDPose, a novel framework for human skeletal motion reconstruction based on WiFi micro-Doppler signatures, is proposed. It provides an effective solution to track human activities by reconstructing a skeleton model with 17 key points, which can assist with the interpretation of conventional RF sensing outputs in a more understandable way. Specifically, MDPose has various incremental stages to gradually address a series of challenges: First, a denoising algorithm is implemented to remove any unwanted noise that may affect the feature extraction and enhance weak Doppler signatures. Secondly, the convolutional neural network (CNN)-recurrent neural network (RNN) architecture is applied to learn temporal-spatial dependency from clean micro-Doppler signatures and restore key points' velocity information. Finally, a pose optimising mechanism is employed to estimate the initial state of the skeleton and to limit the increase of error. We have conducted comprehensive tests in a variety of environments using numerous subjects with a single receiver radar system to demonstrate the performance of MDPose, and report 29.4mm mean absolute error over all key points positions, which outperforms state-of-the-art RF-based pose estimation systems.
The brain attenuates its responses to self-produced exteroceptions (e.g., we cannot tickle ourselves). Is this phenomenon, called sensory attenuation, enabled innately, or is it acquired through learning? To explore the latter possibility, we created a neural network model consisting of sensory (proprioceptive and exteroceptive), association, and executive areas. A simulated robot controlled by the network learned to acquire motor patterns with self-produced or externally produced exteroceptive feedback. We found that the robot first increased responses in sensory and association areas for both self-produced and externally produced conditions in the early stage of learning, but then, gradually it attenuated responses in sensory areas only for self-produced conditions. The robot spontaneously acquired a capacity to switch (attenuate or amplify) responses in sensory areas depending on the conditions by switching the neural state of the executive area. This suggests that proactive control of sensory-information flow inside the network was self-organized through learning. We also found that innate alterations in the modulation of sensory-information flow induced some characteristics analogous to schizophrenia and autism spectrum disorder. This study provides a novel perspective on neural mechanisms underlying perceptual phenomena and psychiatric disorders.
We study the problem of non-constrained, discrete-time, online distributed optimization in a multi-agent system where some of the agents do not follow the prescribed update rule either due to failures or malicious intentions. None of the agents have prior information about the identities of the faulty agents and any agent can communicate only with its immediate neighbours. At each time step, a Lipschitz strongly convex cost function is revealed locally to all the agents and the non-faulty agents update their states using their local information and the information obtained from their neighbours. We measure the performance of the online algorithm by comparing it to its offline version when the cost functions are known apriori. The difference between the same is termed as regret. Under sufficient conditions on the graph topology, the number and location of the adversaries, the defined regret grows sublinearly. We further conduct numerical experiments to validate our theoretical results.
Classical image denoising methods utilize the non-local self-similarity principle to effectively recover image content from noisy images. Current state-of-the-art methods use deep convolutional neural networks (CNNs) to effectively learn the mapping from noisy to clean images. Deep denoising CNNs manifest a high learning capacity and integrate non-local information owing to the large receptive field yielded by numerous cascade of hidden layers. However, deep networks are also computationally complex and require large data for training. To address these issues, this study draws the focus on the Self-organized Operational Neural Networks (Self-ONNs) empowered by a novel neuron model that can achieve a similar or better denoising performance with a compact and shallow model. Recently, the concept of super-neurons has been introduced which augment the non-linear transformations of generative neurons by utilizing non-localized kernel locations for an enhanced receptive field size. This is the key accomplishment which renders the need for a deep network configuration. As the integration of non-local information is known to benefit denoising, in this work we investigate the use of super neurons for both synthetic and real-world image denoising. We also discuss the practical issues in implementing the super neuron model on GPUs and propose a trade-off between the heterogeneity of non-localized operations and computational complexity. Our results demonstrate that with the same width and depth, Self-ONNs with super neurons provide a significant boost of denoising performance over the networks with generative and convolutional neurons for both denoising tasks. Moreover, results demonstrate that Self-ONNs with super neurons can achieve a competitive and superior synthetic denoising performances than well-known deep CNN denoisers for synthetic and real-world denoising, respectively.
The primary focus of recent work with largescale transformers has been on optimizing the amount of information packed into the model's parameters. In this work, we ask a different question: Can multimodal transformers leverage explicit knowledge in their reasoning? Existing, primarily unimodal, methods have explored approaches under the paradigm of knowledge retrieval followed by answer prediction, but leave open questions about the quality and relevance of the retrieved knowledge used, and how the reasoning processes over implicit and explicit knowledge should be integrated. To address these challenges, we propose a novel model - Knowledge Augmented Transformer (KAT) - which achieves a strong state-of-the-art result (+6 points absolute) on the open-domain multimodal task of OK-VQA. Our approach integrates implicit and explicit knowledge in an end to end encoder-decoder architecture, while still jointly reasoning over both knowledge sources during answer generation. An additional benefit of explicit knowledge integration is seen in improved interpretability of model predictions in our analysis.
The ability to change a surface's profile allows biological systems to effectively manipulate and blend into their surroundings. Current surface morphing techniques rely either on having a small number of fixed states or on directly driving the entire system. We discovered a subset of scale-independent auxetic metamaterials have a state trajectory with a star-graph structure. At the central node, small nudges can move the material between trajectories, allowing us to locally shift Poisson's ratio, causing the material to take on different shapes under loading. While the number of possible shapes grows exponentially with the size of the material, the probability of finding one at random is vanishingly small. By actively guiding the material through the node points, we produce a reprogrammable surface that does not require inputs to maintain shape and can display arbitrary 2D information and take on complex 3D shapes. Our work opens new opportunities in micro devices, tactile displays, manufacturing, and robotic systems.
In an aerial hybrid massive multiple-input multiple-output (MIMO) and orthogonal frequency division multiplexing (OFDM) system, how to design a spectral-efficient broadband multi-user hybrid beamforming with a limited pilot and feedback overhead is challenging. To this end, by modeling the key transmission modules as an end-to-end (E2E) neural network, this paper proposes a data-driven deep learning (DL)-based unified hybrid beamforming framework for both the time division duplex (TDD) and frequency division duplex (FDD) systems with implicit channel state information (CSI). For TDD systems, the proposed DL-based approach jointly models the uplink pilot combining and downlink hybrid beamforming modules as an E2E neural network. While for FDD systems, we jointly model the downlink pilot transmission, uplink CSI feedback, and downlink hybrid beamforming modules as an E2E neural network. Different from conventional approaches separately processing different modules, the proposed solution simultaneously optimizes all modules with the sum rate as the optimization object. Therefore, by perceiving the inherent property of air-to-ground massive MIMO-OFDM channel samples, the DL-based E2E neural network can establish the mapping function from the channel to the beamformer, so that the explicit channel reconstruction can be avoided with reduced pilot and feedback overhead. Besides, practical low-resolution phase shifters (PSs) introduce the quantization constraint, leading to the intractable gradient backpropagation when training the neural network. To mitigate the performance loss caused by the phase quantization error, we adopt the transfer learning strategy to further fine-tune the E2E neural network based on a pre-trained network that assumes the ideal infinite-resolution PSs. Numerical results show that our DL-based schemes have considerable advantages over state-of-the-art schemes.
This paper investigates a multi-terminal source coding problem under a logarithmic loss fidelity which does not necessarily lead to an additive distortion measure. The problem is motivated by an extension of the Information Bottleneck method to a multi-source scenario where several encoders have to build cooperatively rate-limited descriptions of their sources in order to maximize information with respect to other unobserved (hidden) sources. More precisely, we study fundamental information-theoretic limits of the so-called: (i) Two-way Collaborative Information Bottleneck (TW-CIB) and (ii) the Collaborative Distributed Information Bottleneck (CDIB) problems. The TW-CIB problem consists of two distant encoders that separately observe marginal (dependent) components $X_1$ and $X_2$ and can cooperate through multiple exchanges of limited information with the aim of extracting information about hidden variables $(Y_1,Y_2)$, which can be arbitrarily dependent on $(X_1,X_2)$. On the other hand, in CDIB there are two cooperating encoders which separately observe $X_1$ and $X_2$ and a third node which can listen to the exchanges between the two encoders in order to obtain information about a hidden variable $Y$. The relevance (figure-of-merit) is measured in terms of a normalized (per-sample) multi-letter mutual information metric (log-loss fidelity) and an interesting tradeoff arises by constraining the complexity of descriptions, measured in terms of the rates needed for the exchanges between the encoders and decoders involved. Inner and outer bounds to the complexity-relevance region of these problems are derived from which optimality is characterized for several cases of interest. Our resulting theoretical complexity-relevance regions are finally evaluated for binary symmetric and Gaussian statistical models.