Automatic Cued Speech Recognition (ACSR) provides an intelligent human-machine interface for visual communications, where the Cued Speech (CS) system utilizes lip movements and hand gestures to code spoken language for hearing-impaired people. Previous ACSR approaches often utilize direct feature concatenation as the main fusion paradigm. However, the asynchronous modalities (\textit{i.e.}, lip, hand shape and hand position) in CS may cause interference for feature concatenation. To address this challenge, we propose a transformer based cross-modal mutual learning framework to prompt multi-modal interaction. Compared with the vanilla self-attention, our model forces modality-specific information of different modalities to pass through a modality-invariant codebook, collating linguistic representations for tokens of each modality. Then the shared linguistic knowledge is used to re-synchronize multi-modal sequences. Moreover, we establish a novel large-scale multi-speaker CS dataset for Mandarin Chinese. To our knowledge, this is the first work on ACSR for Mandarin Chinese. Extensive experiments are conducted for different languages (\textit{i.e.}, Chinese, French, and British English). Results demonstrate that our model exhibits superior recognition performance to the state-of-the-art by a large margin.
Transformers have achieved remarkable success in medical image analysis owing to their powerful capability to use flexible self-attention mechanism. However, due to lacking intrinsic inductive bias in modeling visual structural information, they generally require a large-scale pre-training schedule, limiting the clinical applications over expensive small-scale medical data. To this end, we propose a parameter-efficient transformer to explore intrinsic inductive bias via position information for medical image segmentation. Specifically, we empirically investigate how different position encoding strategies affect the prediction quality of the region of interest (ROI), and observe that ROIs are sensitive to the position encoding strategies. Motivated by this, we present a novel Hybrid Axial-Attention (HAA), a form of position self-attention that can be equipped with spatial pixel-wise information and relative position information as inductive bias. Moreover, we introduce a gating mechanism to alleviate the burden of training schedule, resulting in efficient feature selection over small-scale datasets. Experiments on the BraTS and Covid19 datasets prove the superiority of our method over the baseline and previous works. Internal workflow visualization with interpretability is conducted to better validate our success.
Rotational speed is one of the important metrics to be measured for calibrating the electric motors in manufacturing, monitoring engine during car repairing, faults detection on electrical appliance and etc. However, existing measurement techniques either require prohibitive hardware (e.g., high-speed camera) or are inconvenient to use in real-world application scenarios. In this paper, we propose, EV-Tach, an event-based tachometer via efficient dynamic vision sensing on mobile devices. EV-Tach is designed as a high-fidelity and convenient tachometer by introducing dynamic vision sensor as a new sensing modality to capture the high-speed rotation precisely under various real-world scenarios. By designing a series of signal processing algorithms bespoke for dynamic vision sensing on mobile devices, EV-Tach is able to extract the rotational speed accurately from the event stream produced by dynamic vision sensing on rotary targets. According to our extensive evaluations, the Relative Mean Absolute Error (RMAE) of EV-Tach is as low as 0.03% which is comparable to the state-of-the-art laser tachometer under fixed measurement mode. Moreover, EV-Tach is robust to subtle movement of user's hand, therefore, can be used as a handheld device, where the laser tachometer fails to produce reasonable results.
Reinforcement learning methods as a promising technique have achieved superior results in the motion planning of free-floating space robots. However, due to the increase in planning dimension and the intensification of system dynamics coupling, the motion planning of dual-arm free-floating space robots remains an open challenge. In particular, the current study cannot handle the task of capturing a non-cooperative object due to the lack of the pose constraint of the end-effectors. To address the problem, we propose a novel algorithm, EfficientLPT, to facilitate RL-based methods to improve planning accuracy efficiently. Our core contributions are constructing a mixed policy with prior knowledge guidance and introducing infinite norm to build a more reasonable reward function. Furthermore, our method successfully captures a rotating object with different spinning speeds.
In this paper, we consider a reconfigurable intelligence surface (RIS) aided uplink multiuser multi-input multi-output (MIMO) orthogonal frequency division multiplexing (OFDM) system, where the receiver is assumed to conduct low-complexity iterative detection. We aim to minimize the total transmit power by jointly designing the precoder of the transmitter and the passive beamforming of the RIS. This problem can be tackled from the perspective of information theory. But this information-theoretic approach may involve prohibitively high complexity since the number of rate constraints that specify the capacity region of the uplink multiuser channel is exponential in the number of users. To avoid this difficulty, we formulate the design problem of the iterative receiver under the constraints of a maximal iteration number and target bit error rates of users. To tackle this challenging problem, we propose a groupwise successive interference cancellation (SIC) optimization approach, where the signals of users are decoded and cancelled in a group-by-group manner. We present a heuristic user grouping strategy, and resort to the alternating optimization technique to iteratively solve the precoding and passive beamforming sub-problems. Specifically, for the precoding sub-problem, we employ fractional programming to convert it to a convex problem; for the passive beamforming sub-problem, we adopt successive convex approximation to deal with the unit-modulus constraints of the RIS. We show that the proposed groupwise SIC approach has significant advantages in both performance and computational complexity, as compared with the counterpart approaches.
This paper investigates a large unitarily invariant system (LUIS) involving a unitarily invariant sensing matrix, an arbitrary fixed signal distribution, and forward error control (FEC) coding. Several area properties are established based on the state evolution of orthogonal approximate message passing (OAMP) in an un-coded LUIS. Under the assumptions that the state evolution for joint OAMP and FEC decoding is correct and the replica method is reliable, we analyze the achievable rate of OAMP. We prove that OAMP reaches the constrained capacity predicted by the replica method of the LUIS with an arbitrary signal distribution based on matched FEC coding. Meanwhile, we elaborate a constrained capacity-achieving coding principle for LUIS, based on which irregular low-density parity-check (LDPC) codes are optimized for binary signaling in the simulation results. We show that OAMP with the optimized codes has significant performance improvement over the un-optimized ones and the well-known Turbo linear MMSE algorithm. For quadrature phase-shift keying (QPSK) modulation, constrained capacity-approaching bit error rate (BER) performances are observed under various channel conditions.
Approximate message passing (AMP) type algorithms have been widely used in the signal reconstruction of certain large random linear systems. A key feature of the AMP-type algorithms is that their dynamics can be correctly described by state evolution. However, state evolution does not necessarily guarantee the convergence of iterative algorithms. To solve the convergence problem of AMP-type algorithms in principle, this paper proposes a memory AMP (MAMP) under a sufficient statistic condition, named sufficient statistic MAMP (SS-MAMP). We show that the covariance matrices of SS-MAMP are L-banded and convergent. Given an arbitrary MAMP, we can construct the SS-MAMP by damping, which not only ensures the convergence, but also preserves the orthogonality, i.e., its dynamics can be correctly described by state evolution.
Approximate Message Passing (AMP) is an efficient iterative parameter-estimation technique for certain high-dimensional linear systems with non-Gaussian distributions, such as sparse systems. In AMP, a so-called Onsager term is added to keep estimation errors approximately Gaussian. Orthogonal AMP (OAMP) does not require this Onsager term, relying instead on an orthogonalization procedure to keep the current errors uncorrelated with (i.e., orthogonal to) past errors. In this paper, we show that the orthogonality in OAMP ensures that errors are "asymptotically independently and identically distributed Gaussian" (AIIDG). This AIIDG property, which is essential for the attractive performance of OAMP, holds for separable functions. We present a procedure to realize the required orthogonality for OAMP through Gram-Schmidt orthogonalization (GSO). We show that expectation propagation (EP), AMP, OAMP and some other algorithms can be unified under this orthogonality framework. The simplicity and generality of OAMP provide efficient solutions for estimation problems beyond the classical linear models; related applications will be discussed in a companion paper where new algorithms are developed for problems with multiple constraints and multiple measurement variables.
Traditional ground wireless communication networks cannot provide high-quality services for artificial intelligence (AI) applications such as intelligent transportation systems (ITS) due to deployment, coverage and capacity issues. The space-air-ground integrated network (SAGIN) has become a research focus in the industry. Compared with traditional wireless communication networks, SAGIN is more flexible and reliable, and it has wider coverage and higher quality of seamless connection. However, due to its inherent heterogeneity, time-varying and self-organizing characteristics, the deployment and use of SAGIN still faces huge challenges, among which the orchestration of heterogeneous resources is a key issue. Based on virtual network architecture and deep reinforcement learning (DRL), we model SAGIN's heterogeneous resource orchestration as a multi-domain virtual network embedding (VNE) problem, and propose a SAGIN cross-domain VNE algorithm. We model the different network segments of SAGIN, and set the network attributes according to the actual situation of SAGIN and user needs. In DRL, the agent is acted by a five-layer policy network. We build a feature matrix based on network attributes extracted from SAGIN and use it as the agent training environment. Through training, the probability of each underlying node being embedded can be derived. In test phase, we complete the embedding process of virtual nodes and links in turn based on this probability. Finally, we verify the effectiveness of the algorithm from both training and testing.
Network virtualization (NV) is a technology with broad application prospects. Virtual network embedding (VNE) is the core orientation of VN, which aims to provide more flexible underlying physical resource allocation for user function requests. The classical VNE problem is usually solved by heuristic method, but this method often limits the flexibility of the algorithm and ignores the time limit. In addition, the partition autonomy of physical domain and the dynamic characteristics of virtual network request (VNR) also increase the difficulty of VNE. This paper proposed a new type of VNE algorithm, which applied reinforcement learning (RL) and graph neural network (GNN) theory to the algorithm, especially the combination of graph convolutional neural network (GCNN) and RL algorithm. Based on a self-defined fitness matrix and fitness value, we set up the objective function of the algorithm implementation, realized an efficient dynamic VNE algorithm, and effectively reduced the degree of resource fragmentation. Finally, we used comparison algorithms to evaluate the proposed method. Simulation experiments verified that the dynamic VNE algorithm based on RL and GCNN has good basic VNE characteristics. By changing the resource attributes of physical network and virtual network, it can be proved that the algorithm has good flexibility.