Enhanced mobile broadband (eMBB) and ultrareliable and low-latency communications (URLLC) are two major expected services in the fifth-generation mobile communication systems (5G). Specifically, eMBB applications support extremely high data rate communications, while URLLC services aim to provide stringent latency with high reliability communications. Due to their differentiated quality-of-service (QoS) requirements, the spectrum sharing between URLLC and eMBB services becomes a challenging scheduling issue. In this paper, we aim to investigate the URLLC and eMBB coscheduling/coexistence problem under a puncturing technique in multiple-input multiple-output (MIMO) non-orthogonal multiple access (NOMA) systems. The objective function is formulated to maximize the data rate of eMBB users while satisfying the latency requirements of URLLC users through joint user selection and power allocation scheduling. To solve this problem, we first introduce an eMBB user clustering mechanism to balance the system performance and computational complexity. Thereafter, we decompose the original problem into two subproblems, namely the scheduling problem of user selection and power allocation. We introduce a Gale-Shapley (GS) theory to solve with the user selection problem, and a successive convex approximation (SCA) and a difference of convex (D.C.) programming to deal with the power allocation problem. Finally, an iterative algorithm is utilized to find the global solution with low computational complexity. Numerical results show the effectiveness of the proposed algorithms, and also verify the proposed approach outperforms other baseline methods.
Context information in search sessions has proven to be useful for capturing user search intent. Existing studies explored user behavior sequences in sessions in different ways to enhance query suggestion or document ranking. However, a user behavior sequence has often been viewed as a definite and exact signal reflecting a user's behavior. In reality, it is highly variable: user's queries for the same intent can vary, and different documents can be clicked. To learn a more robust representation of the user behavior sequence, we propose a method based on contrastive learning, which takes into account the possible variations in user's behavior sequences. Specifically, we propose three data augmentation strategies to generate similar variants of user behavior sequences and contrast them with other sequences. In so doing, the model is forced to be more robust regarding the possible variations. The optimized sequence representation is incorporated into document ranking. Experiments on two real query log datasets show that our proposed model outperforms the state-of-the-art methods significantly, which demonstrates the effectiveness of our method for context-aware document ranking.
Designing pre-training objectives that more closely resemble the downstream tasks for pre-trained language models can lead to better performance at the fine-tuning stage, especially in the ad-hoc retrieval area. Existing pre-training approaches tailored for IR tried to incorporate weak supervised signals, such as query-likelihood based sampling, to construct pseudo query-document pairs from the raw textual corpus. However, these signals rely heavily on the sampling method. For example, the query likelihood model may lead to much noise in the constructed pre-training data. \blfootnote{$\dagger$ This work was done during an internship at Huawei.} In this paper, we propose to leverage the large-scale hyperlinks and anchor texts to pre-train the language model for ad-hoc retrieval. Since the anchor texts are created by webmasters and can usually summarize the target document, it can help to build more accurate and reliable pre-training samples than a specific algorithm. Considering different views of the downstream ad-hoc retrieval, we devise four pre-training tasks based on the hyperlinks. We then pre-train the Transformer model to predict the pair-wise preference, jointly with the Masked Language Model objective. Experimental results on two large-scale ad-hoc retrieval datasets show the significant improvement of our model compared with the existing methods.
Millimeter wave (mmWave) communication is a promising New Radio in Unlicensed (NR-U) technology to meet with the ever-increasing data rate and connectivity requirements in future wireless networks. However, the development of NR-U networks should consider the coexistence with the incumbent Wireless Gigabit (WiGig) networks. In this paper, we introduce a novel multiple-input multiple-output non-orthogonal multiple access (MIMO-NOMA) based mmWave NR-U and WiGig coexistence network for uplink transmission. Our aim for the proposed coexistence network is to maximize the spectral efficiency while ensuring the strict NR-U delay requirement and the WiGig transmission performance in real time environments. A joint user grouping, hybrid beam coordination and power control strategy is proposed, which is formulated as a Lyapunov optimization based mixed-integer nonlinear programming (MINLP) with unit-modulus and nonconvex coupling constraints. Hence, we introduce a penalty dual decomposition (PDD) framework, which first transfers the formulated MINLP into a tractable augmented Lagrangian (AL) problem. Thereafter, we integrate both convex-concave procedure (CCCP) and inexact block coordinate update (BCU) methods to approximately decompose the AL problem into multiple nested convex subproblems, which can be iteratively solved under the PDD framework. Numerical results illustrate the performance improvement ability of the proposed strategy, as well as demonstrate the effectiveness to guarantee the NR-U traffic delay and WiGig network performance.
A proactive dialogue system has the ability to proactively lead the conversation. Different from the general chatbots which only react to the user, proactive dialogue systems can be used to achieve some goals, e.g., to recommend some items to the user. Background knowledge is essential to enable smooth and natural transitions in dialogue. In this paper, we propose a new multi-task learning framework for retrieval-based knowledge-grounded proactive dialogue. To determine the relevant knowledge to be used, we frame knowledge prediction as a complementary task and use explicit signals to supervise its learning. The final response is selected according to the predicted knowledge, the goal to achieve, and the context. Experimental results show that explicit modeling of knowledge prediction and goal selection can greatly improve the final response selection. Our code is available at https://github.com/DaoD/KPN/.
With the accelerated development of immersive applications and the explosive increment of internet-of-things (IoT) terminals, 6G would introduce terahertz (THz) massive multiple-input multiple-output non-orthogonal multiple access (MIMO-NOMA) technologies to meet the ultra-high-speed transmission and massive connectivity requirements. Nevertheless, the unreliability of THz transmissions and the extreme heterogeneity of device requirements pose critical challenges for practical applications. To address these challenges, we propose a novel smart reconfigurable THz MIMO-NOMA framework, which can realize customizable and intelligent communications by flexibly and coordinately reconfiguring hybrid beams through the cooperation between access points (APs) and reconfigurable intelligent surfaces (RISs). The optimization problem is formulated as a decentralized partially-observable Markov decision process (Dec-POMDP) to maximize the network energy efficiency, while guaranteeing the diversified users' performance, via a joint RIS element selection, coordinated discrete phase-shift control, and power allocation strategy. To solve the above non-convex, strongly coupled, and highly complex mixed integer nonlinear programming (MINLP) problem, we propose a novel multi-agent deep reinforcement learning (MADRL) algorithm, namely graph-embedded value-decomposition actor-critic (GE-VDAC), that embeds the interaction information of agents, and learns a locally optimal solution through a distributed policy. Numerical results demonstrate that the proposed algorithm achieves highly customized communications and outperforms traditional MADRL algorithms.
Augmented Reality (AR) as a platform has the potential to facilitate the reduction of the cocktail party effect. Future AR headsets could potentially leverage information from an array of sensors spanning many different modalities. Training and testing signal processing and machine learning algorithms on tasks such as beam-forming and speech enhancement require high quality representative data. To the best of the author's knowledge, as of publication there are no available datasets that contain synchronized egocentric multi-channel audio and video with dynamic movement and conversations in a noisy environment. In this work, we describe, evaluate and release a dataset that contains over 5 hours of multi-modal data useful for training and testing algorithms for the application of improving conversations for an AR glasses wearer. We provide speech intelligibility, quality and signal-to-noise ratio improvement results for a baseline method and show improvements across all tested metrics. The dataset we are releasing contains AR glasses egocentric multi-channel microphone array audio, wide field-of-view RGB video, speech source pose, headset microphone audio, annotated voice activity, speech transcriptions, head bounding boxes, target of speech and source identification labels. We have created and are releasing this dataset to facilitate research in multi-modal AR solutions to the cocktail party problem.
As a simple technique to accelerate inference of large-scale pre-trained models, early exiting has gained much attention in the NLP community. It allows samples to exit early at internal classifiers without passing through the entire model. Most existing work usually trains the internal classifiers independently and employs an exiting strategy to decide whether or not to exit based on the confidence of the current internal classifier. However, none of these works takes full advantage of the fact that the internal classifiers are trained to solve the same task therefore can be used to construct an ensemble. In this paper, we show that a novel objective function for the training of the ensemble internal classifiers can be naturally induced from the perspective of ensemble learning and information theory. The proposed training objective consists of two terms: one for accuracy and the other for the diversity of the internal classifiers. In contrast, the objective used in prior work is exactly the accuracy term of our training objective therefore only optimizes the accuracy but not diversity. Further, we propose a simple voting-based strategy that considers predictions of all the past internal classifiers to infer the correct label and decide whether to exit. Experimental results on various NLP tasks show that our proposed objective function and voting-based strategy can achieve better accuracy-speed trade-offs.
Recent years have witnessed great progress on building emotional chatbots. Tremendous methods have been proposed for chatbots to generate responses with given emotions. However, the emotion changes of the user during the conversation has not been fully explored. In this work, we study the problem of positive emotion elicitation, which aims to generate responses that can elicit positive emotion of the user, in human-machine conversation. We propose a weakly supervised Emotion Eliciting Machine (EEM) to address this problem. Specifically, we first collect weak labels of user emotion status changes in a conversion based on a pre-trained emotion classifier. Then we propose a dual encoder-decoder structure to model the generation of responses in both positive and negative side based on the changes of the user's emotion status in the conversation. An emotion eliciting factor is introduced on top of the dual structure to balance the positive and negative emotional impacts on the generated response during emotion elicitation. The factor also provides a fine-grained controlling manner for emotion elicitation. Experimental results on a large real-world dataset show that EEM outperforms the existing models in generating responses with positive emotion elicitation.
A common shortfall of supervised learning for medical imaging is the greedy need for human annotations, which is often expensive and time-consuming to obtain. This paper proposes a semi-supervised classification method for three kinds of apicomplexan parasites and non-infected host cells microscopic images, which uses a small number of labeled data and a large number of unlabeled data for training. There are two challenges in microscopic image recognition. The first is that salient structures of the microscopic images are more fuzzy and intricate than natural images' on a real-world scale. The second is that insignificant textures, like background staining, lightness, and contrast level, vary a lot in samples from different clinical scenarios. To address these challenges, we aim to learn a distinguishable and appearance-invariant representation by contrastive learning strategy. On one hand, macroscopic images, which share similar shape characteristics in morphology, are introduced to contrast for structure enhancement. On the other hand, different appearance transformations, including color distortion and flittering, are utilized to contrast for texture elimination. In the case where only 1% of microscopic images are labeled, the proposed method reaches an accuracy of 94.90% in a generalized testing set.