Recent NLP literature has seen growing interest in improving model interpretability. Along this direction, we propose a trainable neural network layer that learns a global interaction graph between words and then selects more informative words using the learned word interactions. Our layer, we call WIGRAPH, can plug into any neural network-based NLP text classifiers right after its word embedding layer. Across multiple SOTA NLP models and various NLP datasets, we demonstrate that adding the WIGRAPH layer substantially improves NLP models' interpretability and enhances models' prediction performance at the same time.
Extremely large-scale multiple-input-multiple-output (XL-MIMO) is a promising technology for the future sixth-generation (6G) networks to achieve higher performance. In practice, various linear precoding schemes, such as zero-forcing (ZF) and regularized zero-forcing (RZF) precoding, are capable of achieving both large spectral efficiency (SE) and low bit error rate (BER) in traditional massive MIMO (mMIMO) systems. However, these methods are not efficient in extremely large-scale regimes due to the inherent spatial non-stationarity and high computational complexity. To address this problem, we investigate a low-complexity precoding algorithm, e.g., randomized Kaczmarz (rKA), taking into account the spatial non-stationary properties in XL-MIMO systems. Furthermore, we propose a novel mode of randomization, i.e., sampling without replacement rKA (SwoR-rKA), which enjoys a faster convergence speed than the rKA algorithm. Besides, the closed-form expression of SE considering the interference between subarrays in downlink XL-MIMO systems is derived. Numerical results show that the complexity given by both rKA and SwoR-rKA algorithms has 51.3% reduction than the traditional RZF algorithm with similar SE performance. More importantly, our algorithms can effectively reduce the BER when the transmitter has imperfect channel estimation.
In this paper, we investigate the uplink performance of cell-free (CF) extremely large-scale multiple-input-multipleoutput (XL-MIMO) systems, which is a promising technique for future wireless communications. More specifically, we consider the practical scenario with multiple base stations (BSs) and multiple user equipments (UEs). To this end, we derive exact achievable spectral efficiency (SE) expressions for any combining scheme. It is worth noting that we derive the closed-form SE expressions for the CF XL-MIMO with maximum ratio (MR) combining. Numerical results show that the SE performance of the CF XL-MIMO can be hugely improved compared with the small-cell XL-MIMO. It is interesting that a smaller antenna spacing leads to a higher correlation level among patch antennas. Finally, we prove that increasing the number of UE antennas may decrease the SE performance with MR combining.
In this paper, we investigate a cell-free massive multiple-input multiple-output system with both access points and user equipments equipped with multiple antennas over the Weichselberger Rayleigh fading channel. We study the uplink spectral efficiency (SE) for the fully centralized processing scheme and large-scale fading decoding (LSFD) scheme. To further improve the SE performance, we design the uplink precoding schemes based on the weighted sum SE maximization. Since the weighted sum SE maximization problem is not jointly over all optimization variables, two efficient uplink precoding schemes based on Iteratively Weighted sum-Minimum Mean Square Error (I-WMMSE) algorithms, which rely on the iterative minimization of weighted MSE, are proposed for two processing schemes investigated. Furthermore, with maximum ratio combining applied in the LSFD scheme, we derive novel closed-form achievable SE expressions and optimal precoding schemes. Numerical results validate the proposed results and show that the I-WMMSE precoding schemes can achieve excellent sum SE performance with a large number of UE antennas.
Although DETR-based 3D detectors can simplify the detection pipeline and achieve direct sparse predictions, their performance still lags behind dense detectors with post-processing for 3D object detection from point clouds. DETRs usually adopt a larger number of queries than GTs (e.g., 300 queries v.s. 40 objects in Waymo) in a scene, which inevitably incur many false positives during inference. In this paper, we propose a simple yet effective sparse 3D detector, named Query Contrast Voxel-DETR (ConQueR), to eliminate the challenging false positives, and achieve more accurate and sparser predictions. We observe that most false positives are highly overlapping in local regions, caused by the lack of explicit supervision to discriminate locally similar queries. We thus propose a Query Contrast mechanism to explicitly enhance queries towards their best-matched GTs over all unmatched query predictions. This is achieved by the construction of positive and negative GT-query pairs for each GT, and a contrastive loss to enhance positive GT-query pairs against negative ones based on feature similarities. ConQueR closes the gap of sparse and dense 3D detectors, and reduces up to ~60% false positives. Our single-frame ConQueR achieves new state-of-the-art (sota) 71.6 mAPH/L2 on the challenging Waymo Open Dataset validation set, outperforming previous sota methods (e.g., PV-RCNN++) by over 2.0 mAPH/L2.
The spatial correlations and the temporal contexts are indispensable in Electroencephalogram (EEG)-based emotion recognition. However, the learning of complex spatial correlations among several channels is a challenging problem. Besides, the temporal contexts learning is beneficial to emphasize the critical EEG frames because the subjects only reach the prospective emotion during part of stimuli. Hence, we propose a novel Spatial-Temporal Information Learning Network (STILN) to extract the discriminative features by capturing the spatial correlations and temporal contexts. Specifically, the generated 2D power topographic maps capture the dependencies among electrodes, and they are fed to the CNN-based spatial feature extraction network. Furthermore, Convolutional Block Attention Module (CBAM) recalibrates the weights of power topographic maps to emphasize the crucial brain regions and frequency bands. Meanwhile, Batch Normalizations (BNs) and Instance Normalizations (INs) are appropriately combined to relieve the individual differences. In the temporal contexts learning, we adopt the Bidirectional Long Short-Term Memory Network (Bi-LSTM) network to capture the dependencies among the EEG frames. To validate the effectiveness of the proposed method, subject-independent experiments are conducted on the public DEAP dataset. The proposed method has achieved the outstanding performance, and the accuracies of arousal and valence classification have reached 0.6831 and 0.6752 respectively.
Both the temporal dynamics and spatial correlations of Electroencephalogram (EEG), which contain discriminative emotion information, are essential for the emotion recognition. However, some redundant information within the EEG signals would degrade the performance. Specifically,the subjects reach prospective intense emotions for only a fraction of the stimulus duration. Besides, it is a challenge to extract discriminative features from the complex spatial correlations among a number of electrodes. To deal with the problems, we propose a transformer-based model to robustly capture temporal dynamics and spatial correlations of EEG. Especially, temporal feature extractors which share the weight among all the EEG channels are designed to adaptively extract dynamic context information from raw signals. Furthermore, multi-head self-attention mechanism within the transformers could adaptively localize the vital EEG fragments and emphasize the essential brain regions which contribute to the performance. To verify the effectiveness of the proposed method, we conduct the experiments on two public datasets, DEAP and MAHNOBHCI. The results demonstrate that the proposed method achieves outstanding performance on arousal and valence classification.
The explosive growth of dynamic and heterogeneous data traffic brings great challenges for 5G and beyond mobile networks. To enhance the network capacity and reliability, we propose a learning-based dynamic time-frequency division duplexing (D-TFDD) scheme that adaptively allocates the uplink and downlink time-frequency resources of base stations (BSs) to meet the asymmetric and heterogeneous traffic demands while alleviating the inter-cell interference. We formulate the problem as a decentralized partially observable Markov decision process (Dec-POMDP) that maximizes the long-term expected sum rate under the users' packet dropping ratio constraints. In order to jointly optimize the global resources in a decentralized manner, we propose a federated reinforcement learning (RL) algorithm named federated Wolpertinger deep deterministic policy gradient (FWDDPG) algorithm. The BSs decide their local time-frequency configurations through RL algorithms and achieve global training via exchanging local RL models with their neighbors under a decentralized federated learning framework. Specifically, to deal with the large-scale discrete action space of each BS, we adopt a DDPG-based algorithm to generate actions in a continuous space, and then utilize Wolpertinger policy to reduce the mapping errors from continuous action space back to discrete action space. Simulation results demonstrate the superiority of our proposed algorithm to benchmark algorithms with respect to system sum rate.
Point-of-Interest (POI) recommendation plays a vital role in various location-aware services. It has been observed that POI recommendation is driven by both sequential and geographical influences. However, since there is no annotated label of the dominant influence during recommendation, existing methods tend to entangle these two influences, which may lead to sub-optimal recommendation performance and poor interpretability. In this paper, we address the above challenge by proposing DisenPOI, a novel Disentangled dual-graph framework for POI recommendation, which jointly utilizes sequential and geographical relationships on two separate graphs and disentangles the two influences with self-supervision. The key novelty of our model compared with existing approaches is to extract disentangled representations of both sequential and geographical influences with contrastive learning. To be specific, we construct a geographical graph and a sequential graph based on the check-in sequence of a user. We tailor their propagation schemes to become sequence-/geo-aware to better capture the corresponding influences. Preference proxies are extracted from check-in sequence as pseudo labels for the two influences, which supervise the disentanglement via a contrastive loss. Extensive experiments on three datasets demonstrate the superiority of the proposed model.