Deep neural networks based purely on attention have been successful across several domains, relying on minimal architectural priors from the designer. In Human Action Recognition (HAR), attention mechanisms have been primarily adopted on top of standard convolutional or recurrent layers, improving the overall generalization capability. In this work, we introduce Action Transformer (AcT), a simple, fully self-attentional architecture that consistently outperforms more elaborated networks that mix convolutional, recurrent, and attentive layers. In order to limit computational and energy requests, building on previous human action recognition research, the proposed approach exploits 2D pose representations over small temporal windows, providing a low latency solution for accurate and effective real-time performance. Moreover, we open-source MPOSE2021, a new large-scale dataset, as an attempt to build a formal training and evaluation benchmark for real-time short-time human action recognition. Extensive experimentation on MPOSE2021 with our proposed methodology and several previous architectural solutions proves the effectiveness of the AcT model and poses the base for future work on HAR.
Electrochemical batteries are ubiquitous devices in our society. When they are employed in mission-critical applications, the ability to precisely predict the end of discharge under highly variable environmental and operating conditions is of paramount importance in order to support operational decision-making. While there are accurate predictive models of the processes underlying the charge and discharge phases of batteries, the modelling of ageing and its effect on performance remains poorly understood. Such a lack of understanding often leads to inaccurate models or the need for time-consuming calibration procedures whenever the battery ages or its conditions change significantly. This represents a major obstacle to the real-world deployment of efficient and robust battery management systems. In this paper, we propose for the first time an approach that can predict the voltage discharge curve for batteries of any degradation level without the need for calibration. In particular, we introduce Dynaformer, a novel Transformer-based deep learning architecture which is able to simultaneously infer the ageing state from a limited number of voltage/current samples and predict the full voltage discharge curve for real batteries with high precision. Our experiments show that the trained model is effective for input current profiles of different complexities and is robust to a wide range of degradation levels. In addition to evaluating the performance of the proposed framework on simulated data, we demonstrate that a minimal amount of fine-tuning allows the model to bridge the simulation-to-real gap between simulations and real data collected from a set of batteries. The proposed methodology enables the utilization of battery-powered systems until the end of discharge in a controlled and predictable way, thereby significantly prolonging the operating cycles and reducing costs.
Uncertainties in machine learning are a significant roadblock for its application in safety-critical cyber-physical systems (CPS). One source of uncertainty arises from distribution shifts in the input data between training and test scenarios. Detecting such distribution shifts in real-time is an emerging approach to address the challenge. The high dimensional input space in CPS applications involving imaging adds extra difficulty to the task. Generative learning models are widely adopted for the task, namely out-of-distribution (OoD) detection. To improve the state-of-the-art, we studied existing proposals from both machine learning and CPS fields. In the latter, safety monitoring in real-time for autonomous driving agents has been a focus. Exploiting the spatiotemporal correlation of motion in videos, we can robustly detect hazardous motion around autonomous driving agents. Inspired by the latest advances in the Variational Autoencoder (VAE) theory and practice, we tapped into the prior knowledge in data to further boost OoD detection's robustness. Comparison studies over nuScenes and Synthia data sets show our methods significantly improve detection capabilities of OoD factors unique to driving scenarios, 42% better than state-of-the-art approaches. Our model also generalized near-perfectly, 97% better than the state-of-the-art across the real-world and simulation driving data sets experimented. Finally, we customized one proposed method into a twin-encoder model that can be deployed to resource limited embedded devices for real-time OoD detection. Its execution time was reduced over four times in low-precision 8-bit integer inference, while detection capability is comparable to its corresponding floating-point model.
To mitigate the effects of shadow fading and obstacle blocking, reconfigurable intelligent surface (RIS) has become a promising technology to improve the signal transmission quality of wireless communications by controlling the reconfigurable passive elements with less hardware cost and lower power consumption. However, accurate, low-latency and low-pilot-overhead channel state information (CSI) acquisition remains a considerable challenge in RIS-assisted systems due to the large number of RIS passive elements. In this paper, we propose a three-stage joint channel decomposition and prediction framework to require CSI. The proposed framework exploits the two-timescale property that the base station (BS)-RIS channel is quasi-static and the RIS-user equipment (UE) channel is fast time-varying. Specifically, in the first stage, we use the full-duplex technique to estimate the channel between a BS's specific antenna and the RIS, addressing the critical scaling ambiguity problem in the channel decomposition. We then design a novel deep neural network, namely, the sparse-connected long short-term memory (SCLSTM), and propose a SCLSTM-based algorithm in the second and third stages, respectively. The algorithm can simultaneously decompose the BS-RIS channel and RIS-UE channel from the cascaded channel and capture the temporal relationship of the RIS-UE channel for prediction. Simulation results show that our proposed framework has lower pilot overhead than the traditional channel estimation algorithms, and the proposed SCLSTM-based algorithm can also achieve more accurate CSI acquisition robustly and effectively.
Model-based treatment planning for transcranial ultrasound therapy typically involves mapping the acoustic properties of the skull from an x-ray computed tomography (CT) image of the head. Here, three methods for generating pseudo-CT images from magnetic resonance (MR) images were compared as an alternative to CT. A convolutional neural network (U-Net) was trained on paired MR-CT images to generate pseudo-CT images from either T1-weighted or zero-echo time (ZTE) MR images (denoted tCT and zCT, respectively). A direct mapping from ZTE to pseudo-CT was also implemented (denoted cCT). When comparing the pseudo-CT and ground truth CT images for the test set, the mean absolute error was 133, 83, and 145 Hounsfield units (HU) across the whole head, and 398, 222, and 336 HU within the skull for the tCT, zCT, and cCT images, respectively. Ultrasound simulations were also performed using the generated pseudo-CT images and compared to simulations based on CT. An annular array transducer was used targeting the visual or motor cortex. The mean differences in the simulated focal pressure, focal position, and focal volume were 9.9%, 1.5 mm, and 15.1% for simulations based on the tCT images, 5.7%, 0.6 mm, and 5.7% for the zCT, and 6.7%, 0.9 mm, and 12.1% for the cCT. The improved results for images mapped from ZTE highlight the advantage of using imaging sequences which improve contrast of the skull bone. Overall, these results demonstrate that acoustic simulations based on MR images can give comparable accuracy to those based on CT.
We consider fair allocation of a set $M$ of indivisible goods to $n$ equally-entitled agents, with no monetary transfers. Every agent $i$ has a valuation $v_i$ from some given class of valuation functions. A share $s$ is a function that maps a pair $(v_i,n)$ to a value, with the interpretation that if an allocation of $M$ to $n$ agents fails to give agent $i$ a bundle of value at least equal to $s(v_i,n)$, this serves as evidence that the allocation is not fair towards $i$. For such an interpretation to make sense, we would like the share to be feasible, meaning that for any valuations in the class, there is an allocation that gives every agent at least her share. The maximin share was a natural candidate for a feasible share for additive valuations. However, Kurokawa, Procaccia and Wang [2018] show that it is not feasible. We initiate a systematic study of the family of feasible shares. We say that a share is \emph{self maximizing} if truth-telling maximizes the implied guarantee. We show that every feasible share is dominated by some self-maximizing and feasible share. We seek to identify those self-maximizing feasible shares that are polynomial time computable, and offer the highest share values. We show that a SM-dominating feasible share -- one that dominates every self-maximizing (SM) feasible share -- does not exist for additive valuations (and beyond). Consequently, we relax the domination property to that of domination up to a multiplicative factor of $\rho$ (called $\rho$-dominating). For additive valuations we present shares that are feasible, self-maximizing and polynomial-time computable. For $n$ agents we present such a share that is $\frac{2n}{3n-1}$-dominating. For two agents we present such a share that is $(1 - \epsilon)$-dominating. Moreover, for these shares we present poly-time algorithms that compute allocations that give every agent at least her share.
Recent research has established the effectiveness of machine learning for data-driven prediction of the future evolution of unknown dynamical systems, including chaotic systems. However, these approaches require large amounts of measured time series data from the process to be predicted. When only limited data is available, forecasters are forced to impose significant model structure that may or may not accurately represent the process of interest. In this work, we present a Meta-learning Approach to Reservoir Computing (MARC), a data-driven approach to automatically extract an appropriate model structure from experimentally observed "related" processes that can be used to vastly reduce the amount of data required to successfully train a predictive model. We demonstrate our approach on a simple benchmark problem, where it beats the state of the art meta-learning techniques, as well as a challenging chaotic problem.
We introduce DeepNash, an autonomous agent capable of learning to play the imperfect information game Stratego from scratch, up to a human expert level. Stratego is one of the few iconic board games that Artificial Intelligence (AI) has not yet mastered. This popular game has an enormous game tree on the order of $10^{535}$ nodes, i.e., $10^{175}$ times larger than that of Go. It has the additional complexity of requiring decision-making under imperfect information, similar to Texas hold'em poker, which has a significantly smaller game tree (on the order of $10^{164}$ nodes). Decisions in Stratego are made over a large number of discrete actions with no obvious link between action and outcome. Episodes are long, with often hundreds of moves before a player wins, and situations in Stratego can not easily be broken down into manageably-sized sub-problems as in poker. For these reasons, Stratego has been a grand challenge for the field of AI for decades, and existing AI methods barely reach an amateur level of play. DeepNash uses a game-theoretic, model-free deep reinforcement learning method, without search, that learns to master Stratego via self-play. The Regularised Nash Dynamics (R-NaD) algorithm, a key component of DeepNash, converges to an approximate Nash equilibrium, instead of 'cycling' around it, by directly modifying the underlying multi-agent learning dynamics. DeepNash beats existing state-of-the-art AI methods in Stratego and achieved a yearly (2022) and all-time top-3 rank on the Gravon games platform, competing with human expert players.
In this paper, we propose Meta-SysId, a meta-learning approach to model sets of systems that have behavior governed by common but unknown laws and that differentiate themselves by their context. Inspired by classical modeling-and-identification approaches, Meta-SysId learns to represent the common law through shared parameters and relies on online optimization to compute system-specific context. Compared to optimization-based meta-learning methods, the separation between class parameters and context variables reduces the computational burden while allowing batch computations and a simple training scheme. We test Meta-SysId on polynomial regression, time-series prediction, model-based control, and real-world traffic prediction domains, empirically finding it outperforms or is competitive with meta-learning baselines.
This letter explores the possibility of using microsimulation in space time trellis code. Performing a pairwise comparison between generator matrices is essential in the validation of optimality. This is often done with simulation, which can be a time consuming process altogether. Microsimulation considerably cuts down the computational cost of simulation by employing smaller data and iteration. The effort is feasible with the assistance of a machine learning model known as multilayer perceptron. When properly conducted, it can offer 93.86% accuracy and 98.25% reduction in temporal cost.