In the real world, long sequence time-series forecasting (LSTF) is needed in many cases, such as power consumption prediction and air quality prediction.Multi-dimensional long time series model has more strict requirements on the model, which not only needs to effectively capture the accurate long-term dependence between input and output, but also needs to capture the relationship between data of different dimensions.Recent research shows that the Informer model based on Transformer has achieved excellent performance in long time series prediction.However, this model still has some deficiencies in multidimensional prediction,it cannot capture the relationship between different dimensions well. We improved Informer to address its shortcomings in multidimensional forecasting. First,we introduce an adaptive graph neural network to capture hidden dimension dependencies in mostly time series prediction. Secondly,we integrate adaptive graph convolutional networks into various spatio-temporal series prediction models to solve the defect that they cannot capture the relationship between different dimensions. Thirdly,After experimental testing with multiple data sets, the accuracy of our framework improved by about 10\% after being introduced into the model.
This paper deals with the utterance-level modalities missing problem with uncertain patterns on emotion recognition in conversation (ERC) task. Present models generally predict the speaker's emotions by its current utterance and context, which is degraded by modality missing considerably. Our work proposes a framework Missing-Modality Robust emotion Recognition (M2R2), which trains emotion recognition model with iterative data augmentation by learned common representation. Firstly, a network called Party Attentive Network (PANet) is designed to classify emotions, which tracks all the speakers' states and context. Attention mechanism between speaker with other participants and dialogue topic is used to decentralize dependence on multi-time and multi-party utterances instead of the possible incomplete one. Moreover, the Common Representation Learning (CRL) problem is defined for modality-missing problem. Data imputation methods improved by the adversarial strategy are used here to construct extra features to augment data. Extensive experiments and case studies validate the effectiveness of our methods over baselines for modality-missing emotion recognition on two different datasets.
To reap the promised gain achieved by distributed reconfigurable intelligent surfaces (RISs)-enhanced communications in a wireless network, timing synchronization among these metasurfaces is an essential prerequisite in practice. This paper proposes a unified framework for the joint estimation of the unknown timing offsets and the RIS channel parameters, as well as the design of cooperative reflection and synchronization algorithm for the distributed multiple-RIS communication. Considering that RIS is usually a passive device with limited capability of signal processing, the individual timing offset and channel gains of each hop of the RIS links cannot be directly estimated. To make the estimation tractable, we propose to estimate the cascaded channels and timing offsets jointly by deriving a maximum likelihood estimator. Furthermore, we theoretically characterize the Cramer-Rao lower bound (CRLB) to evaluate the accuracy of this estimator. By using the proposed estimator and the derived CRLBs, an efficient resynchronization algorithm is devised jointly at the RISs and the destination to compensate the multiple timing offsets. Based on the majorization-minimization framework, the proposed algorithm admits semi-closed and closed form solutions for the RIS reflection matrices and the timing offset equalizer, respectively. Simulation results verify that our theoretical analysis well matches the numerical tests and validate the effectiveness of the proposed resynchronization algorithm.
We develop an abstractive summarization framework independent of labeled data for multiple heterogeneous documents. Unlike existing multi-document summarization methods, our framework processes documents telling different stories instead of documents on the same topic. We also enhance an existing sentence fusion method with a uni-directional language model to prioritize fused sentences with higher sentence probability with the goal of increasing readability. Lastly, we construct a total of twelve dataset variations based on CNN/Daily Mail and the NewsRoom datasets, where each document group contains a large and diverse collection of documents to evaluate the performance of our model in comparison with other baseline systems. Our experiments demonstrate that our framework outperforms current state-of-the-art methods in this more generic setting.
This paper reviews the NTIRE 2022 Challenge on Super-Resolution and Quality Enhancement of Compressed Video. In this challenge, we proposed the LDV 2.0 dataset, which includes the LDV dataset (240 videos) and 95 additional videos. This challenge includes three tracks. Track 1 aims at enhancing the videos compressed by HEVC at a fixed QP. Track 2 and Track 3 target both the super-resolution and quality enhancement of HEVC compressed video. They require x2 and x4 super-resolution, respectively. The three tracks totally attract more than 600 registrations. In the test phase, 8 teams, 8 teams and 12 teams submitted the final results to Tracks 1, 2 and 3, respectively. The proposed methods and solutions gauge the state-of-the-art of super-resolution and quality enhancement of compressed video. The proposed LDV 2.0 dataset is available at https://github.com/RenYang-home/LDV_dataset. The homepage of this challenge (including open-sourced codes) is at https://github.com/RenYang-home/NTIRE22_VEnh_SR.
Hybrid precoding is a cost-efficient technique for millimeter wave (mmWave) massive multiple-input multiple-output (MIMO) communications. This paper proposes a deep learning approach by using a distributed neural network for hybrid analog-and-digital precoding design with limited feedback. The proposed distributed neural precoding network, called DNet, is committed to achieving two objectives. First, the DNet realizes channel state information (CSI) compression with a distributed architecture of neural networks, which enables practical deployment on multiple users. Specifically, this neural network is composed of multiple independent sub-networks with the same structure and parameters, which reduces both the number of training parameters and network complexity. Secondly, DNet learns the calculation of hybrid precoding from reconstructed CSI from limited feedback. Different from existing black-box neural network design, the DNet is specifically designed according to the data form of the matrix calculation of hybrid precoding. Simulation results show that the proposed DNet significantly improves the performance up to nearly 50% compared to traditional limited feedback precoding methods under the tests with various CSI compression ratios.
The increased throughput brought by MIMO technology relies on the knowledge of channel state information (CSI) acquired in the base station (BS). To make the CSI feedback overhead affordable for the evolution of MIMO technology (e.g., massive MIMO and ultra-massive MIMO), deep learning (DL) is introduced to deal with the CSI compression task. Based on the separation principle in existing communication systems, DL based CSI compression is used as source coding. However, this separate source-channel coding (SSCC) scheme is inferior to the joint source-channel coding (JSCC) scheme in the finite blocklength regime. In this paper, we propose a deep joint source-channel coding (DJSCC) based framework for the CSI feedback task. In particular, the proposed method can simultaneously learn from the CSI source and the wireless channel. Instead of truncating CSI via Fourier transform in the delay domain in existing methods, we apply non-linear transform networks to compress the CSI. Furthermore, we adopt an SNR adaption mechanism to deal with the wireless channel variations. The extensive experiments demonstrate the validity, adaptability, and generality of the proposed framework.
Reconfigurable intelligent surfaces (RISs) can establish favorable wireless environments to combat the severe attenuation and blockages in millimeter-wave (mmWave) bands. However, to achieve the optimal enhancement of performance, the instantaneous channel state information (CSI) needs to be estimated at the cost of a large overhead that scales with the number of RIS elements and the number of users. In this paper, we design a quasi-static broad coverage at the RIS with the reduced overhead based on the statistical CSI. We propose a design framework to synthesize the power pattern reflected by the RIS that meets the customized requirements of broad coverage. For the communication of broadcast channels, we generalize the broad coverage of the single transmit stream to the scenario of multiple streams. Moreover, we employ the quasi-static broad coverage for a multiuser orthogonal frequency division multiplexing access (OFDMA) system, and derive the analytical expression of the downlink rate, which is proved to increase logarithmically with the power gain reflected by the RIS. By taking into account the overhead of channel estimation, the proposed quasi-static broad coverage even outperforms the design method that optimizes the RIS phases using the instantaneous CSI. Numerical simulations are conducted to verify these observations.
Neural Architecture Search (NAS) has shown great potential in effectively reducing manual effort in network design by automatically discovering optimal architectures. What is noteworthy is that as of now, object detection is less touched by NAS algorithms despite its significant importance in computer vision. To the best of our knowledge, most of the recent NAS studies on object detection tasks fail to satisfactorily strike a balance between performance and efficiency of the resulting models, let alone the excessive amount of computational resources cost by those algorithms. Here we propose an efficient method to obtain better object detectors by searching for the feature pyramid network (FPN) as well as the prediction head of a simple anchor-free object detector, namely, FCOS [36], using a tailored reinforcement learning paradigm. With carefully designed search space, search algorithms, and strategies for evaluating network quality, we are able to find top-performing detection architectures within 4 days using 8 V100 GPUs. The discovered architectures surpass state-of-the-art object detection models (such as Faster R-CNN, Retina-Net and, FCOS) by 1.0% to 5.4% points in AP on the COCO dataset, with comparable computation complexity and memory footprint, demonstrating the efficacy of the proposed NAS method for object detection. Code is available at https://github.com/Lausannen/NAS-FCOS.