Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Gang Wang

the State Key Lab of Intelligent Control and Decision of Complex Systems and the School of Automation, Beijing Institute of Technology, Beijing, China, Beijing Institute of Technology Chongqing Innovation Center, Chongqing, China

DelightfulTTS: The Microsoft Speech Synthesis System for Blizzard Challenge 2021

Nov 19, 2021

Yanqing Liu, Zhihang Xu, Gang Wang, Kuan Chen, Bohan Li, Xu Tan, Jinzhu Li, Lei He, Sheng Zhao

Figure 1 for DelightfulTTS: The Microsoft Speech Synthesis System for Blizzard Challenge 2021

Figure 2 for DelightfulTTS: The Microsoft Speech Synthesis System for Blizzard Challenge 2021

Figure 3 for DelightfulTTS: The Microsoft Speech Synthesis System for Blizzard Challenge 2021

Figure 4 for DelightfulTTS: The Microsoft Speech Synthesis System for Blizzard Challenge 2021

Abstract:This paper describes the Microsoft end-to-end neural text to speech (TTS) system: DelightfulTTS for Blizzard Challenge 2021. The goal of this challenge is to synthesize natural and high-quality speech from text, and we approach this goal in two perspectives: The first is to directly model and generate waveform in 48 kHz sampling rate, which brings higher perception quality than previous systems with 16 kHz or 24 kHz sampling rate; The second is to model the variation information in speech through a systematic design, which improves the prosody and naturalness. Specifically, for 48 kHz modeling, we predict 16 kHz mel-spectrogram in acoustic model, and propose a vocoder called HiFiNet to directly generate 48 kHz waveform from predicted 16 kHz mel-spectrogram, which can better trade off training efficiency, modelling stability and voice quality. We model variation information systematically from both explicit (speaker ID, language ID, pitch and duration) and implicit (utterance-level and phoneme-level prosody) perspectives: 1) For speaker and language ID, we use lookup embedding in training and inference; 2) For pitch and duration, we extract the values from paired text-speech data in training and use two predictors to predict the values in inference; 3) For utterance-level and phoneme-level prosody, we use two reference encoders to extract the values in training, and use two separate predictors to predict the values in inference. Additionally, we introduce an improved Conformer block to better model the local and global dependency in acoustic model. For task SH1, DelightfulTTS achieves 4.17 mean score in MOS test and 4.35 in SMOS test, which indicates the effectiveness of our proposed system

Via

Access Paper or Ask Questions

Balancing Value Underestimation and Overestimation with Realistic Actor-Critic

Nov 10, 2021

Sicen Li, Gang Wang, Qinyun Tang, Liquan Wang

Figure 1 for Balancing Value Underestimation and Overestimation with Realistic Actor-Critic

Figure 2 for Balancing Value Underestimation and Overestimation with Realistic Actor-Critic

Figure 3 for Balancing Value Underestimation and Overestimation with Realistic Actor-Critic

Figure 4 for Balancing Value Underestimation and Overestimation with Realistic Actor-Critic

Abstract:Model-free deep reinforcement learning (RL) has been successfully applied to challenging continuous control domains. However, poor sample efficiency prevents these methods from being widely used in real-world domains. We address this problem by proposing a novel model-free algorithm, Realistic Actor-Critic(RAC), which aims to solve trade-offs between value underestimation and overestimation by learning a policy family concerning various confidence-bounds of Q-function. We construct uncertainty punished Q-learning(UPQ), which uses uncertainty from the ensembling of multiple critics to control estimation bias of Q-function, making Q-functions smoothly shift from lower- to higher-confidence bounds. With the guide of these critics, RAC employs Universal Value Function Approximators (UVFA) to simultaneously learn many optimistic and pessimistic policies with the same neural network. Optimistic policies generate effective exploratory behaviors, while pessimistic policies reduce the risk of value overestimation to ensure stable updates of policies and Q-functions. The proposed method can be incorporated with any off-policy actor-critic RL algorithms. Our method achieve 10x sample efficiency and 25\% performance improvement compared to SAC on the most challenging Humanoid environment, obtaining the episode reward $11107\pm 475$ at $10^6$ time steps. All the source codes are available at https://github.com/ihuhuhu/RAC.

* Added references. Corrected typos

Via

Access Paper or Ask Questions

HD-cos Networks: Efficient Neural Architectures for Secure Multi-Party Computation

Oct 28, 2021

Wittawat Jitkrittum, Michal Lukasik, Ananda Theertha Suresh, Felix Yu, Gang Wang

Figure 1 for HD-cos Networks: Efficient Neural Architectures for Secure Multi-Party Computation

Figure 2 for HD-cos Networks: Efficient Neural Architectures for Secure Multi-Party Computation

Figure 3 for HD-cos Networks: Efficient Neural Architectures for Secure Multi-Party Computation

Figure 4 for HD-cos Networks: Efficient Neural Architectures for Secure Multi-Party Computation

Abstract:Multi-party computation (MPC) is a branch of cryptography where multiple non-colluding parties execute a well designed protocol to securely compute a function. With the non-colluding party assumption, MPC has a cryptographic guarantee that the parties will not learn sensitive information from the computation process, making it an appealing framework for applications that involve privacy-sensitive user data. In this paper, we study training and inference of neural networks under the MPC setup. This is challenging because the elementary operations of neural networks such as the ReLU activation function and matrix-vector multiplications are very expensive to compute due to the added multi-party communication overhead. To address this, we propose the HD-cos network that uses 1) cosine as activation function, 2) the Hadamard-Diagonal transformation to replace the unstructured linear transformations. We show that both of the approaches enjoy strong theoretical motivations and efficient computation under the MPC setup. We demonstrate on multiple public datasets that HD-cos matches the quality of the more expensive baselines.

Via

Access Paper or Ask Questions

Collaborative Uncertainty in Multi-Agent Trajectory Forecasting

Oct 26, 2021

Bohan Tang, Yiqi Zhong, Ulrich Neumann, Gang Wang, Ya Zhang, Siheng Chen

Figure 1 for Collaborative Uncertainty in Multi-Agent Trajectory Forecasting

Figure 2 for Collaborative Uncertainty in Multi-Agent Trajectory Forecasting

Figure 3 for Collaborative Uncertainty in Multi-Agent Trajectory Forecasting

Figure 4 for Collaborative Uncertainty in Multi-Agent Trajectory Forecasting

Abstract:Uncertainty modeling is critical in trajectory forecasting systems for both interpretation and safety reasons. To better predict the future trajectories of multiple agents, recent works have introduced interaction modules to capture interactions among agents. This approach leads to correlations among the predicted trajectories. However, the uncertainty brought by such correlations is neglected. To fill this gap, we propose a novel concept, collaborative uncertainty(CU), which models the uncertainty resulting from the interaction module. We build a general CU-based framework to make a prediction model to learn the future trajectory and the corresponding uncertainty. The CU-based framework is integrated as a plugin module to current state-of-the-art (SOTA) systems and deployed in two special cases based on multivariate Gaussian and Laplace distributions. In each case, we conduct extensive experiments on two synthetic datasets and two public, large-scale benchmarks of trajectory forecasting. The results are promising: 1) The results of synthetic datasets show that CU-based framework allows the model to appropriately approximate the ground-truth distribution. 2) The results of trajectory forecasting benchmarks demonstrate that the CU-based framework steadily helps SOTA systems improve their performances. Especially, the proposed CU-based framework helps VectorNet improve by 57cm regarding Final Displacement Error on nuScenes dataset. 3) The visualization results of CU illustrate that the value of CU is highly related to the amount of the interactive information among agents.

* This paper has been accepted by NeurIPS 2021

Via

Access Paper or Ask Questions

Learning Dual Dynamic Representations on Time-Sliced User-Item Interaction Graphs for Sequential Recommendation

Sep 24, 2021

Zeyuan Chen, Wei Zhang, Junchi Yan, Gang Wang, Jianyong Wang

Figure 1 for Learning Dual Dynamic Representations on Time-Sliced User-Item Interaction Graphs for Sequential Recommendation

Figure 2 for Learning Dual Dynamic Representations on Time-Sliced User-Item Interaction Graphs for Sequential Recommendation

Figure 3 for Learning Dual Dynamic Representations on Time-Sliced User-Item Interaction Graphs for Sequential Recommendation

Figure 4 for Learning Dual Dynamic Representations on Time-Sliced User-Item Interaction Graphs for Sequential Recommendation

Abstract:Sequential Recommendation aims to recommend items that a target user will interact with in the near future based on the historically interacted items. While modeling temporal dynamics is crucial for sequential recommendation, most of the existing studies concentrate solely on the user side while overlooking the sequential patterns existing in the counterpart, i.e., the item side. Although a few studies investigate the dynamics involved in the dual sides, the complex user-item interactions are not fully exploited from a global perspective to derive dynamic user and item representations. In this paper, we devise a novel Dynamic Representation Learning model for Sequential Recommendation (DRL-SRe). To better model the user-item interactions for characterizing the dynamics from both sides, the proposed model builds a global user-item interaction graph for each time slice and exploits time-sliced graph neural networks to learn user and item representations. Moreover, to enable the model to capture fine-grained temporal information, we propose an auxiliary temporal prediction task over consecutive time slices based on temporal point process. Comprehensive experiments on three public real-world datasets demonstrate DRL-SRe outperforms the state-of-the-art sequential recommendation models with a large margin.

* 11 pages, accepted by CIKM'21

Via

Access Paper or Ask Questions

Generalized Minimum Error Entropy for Adaptive Filtering

Sep 08, 2021

Jiacheng He, Gang Wang, Bei Peng, Zhenyu Feng, Kun Zhang

Figure 1 for Generalized Minimum Error Entropy for Adaptive Filtering

Figure 2 for Generalized Minimum Error Entropy for Adaptive Filtering

Figure 3 for Generalized Minimum Error Entropy for Adaptive Filtering

Figure 4 for Generalized Minimum Error Entropy for Adaptive Filtering

Abstract:Error entropy is a important nonlinear similarity measure, and it has received increasing attention in many practical applications. The default kernel function of error entropy criterion is Gaussian kernel function, however, which is not always the best choice. In our study, a novel concept, called generalized error entropy, utilizing the generalized Gaussian density (GGD) function as the kernel function is proposed. We further derivate the generalized minimum error entropy (GMEE) criterion, and a novel adaptive filtering called GMEE algorithm is derived by utilizing GMEE criterion. The stability, steady-state performance, and computational complexity of the proposed algorithm are investigated. Some simulation indicate that the GMEE algorithm performs well in Gaussian, sub-Gaussian, and super-Gaussian noises environment, respectively. Finally, the GMEE algorithm is applied to acoustic echo cancelation and performs well.

* 9 pages, 8 figures

Via

Access Paper or Ask Questions

The 2nd Anti-UAV Workshop & Challenge: Methods and Results

Aug 25, 2021

Jian Zhao, Gang Wang, Jianan Li, Lei Jin, Nana Fan, Min Wang, Xiaojuan Wang, Ting Yong, Yafeng Deng, Yandong Guo(+2 more)

Figure 1 for The 2nd Anti-UAV Workshop & Challenge: Methods and Results

Figure 2 for The 2nd Anti-UAV Workshop & Challenge: Methods and Results

Abstract:The 2nd Anti-UAV Workshop \& Challenge aims to encourage research in developing novel and accurate methods for multi-scale object tracking. The Anti-UAV dataset used for the Anti-UAV Challenge has been publicly released. There are two subsets in the dataset, $i.e.$, the test-dev subset and test-challenge subset. Both subsets consist of 140 thermal infrared video sequences, spanning multiple occurrences of multi-scale UAVs. Around 24 participating teams from the globe competed in the 2nd Anti-UAV Challenge. In this paper, we provide a brief summary of the 2nd Anti-UAV Workshop \& Challenge including brief introductions to the top three methods.The submission leaderboard will be reopened for researchers that are interested in the Anti-UAV challenge. The benchmark dataset and other information can be found at: https://anti-uav.github.io/.

Via

Access Paper or Ask Questions

Modeling Relevance Ranking under the Pre-training and Fine-tuning Paradigm

Aug 12, 2021

Lin Bo, Liang Pang, Gang Wang, Jun Xu, XiuQiang He, Ji-Rong Wen

Figure 1 for Modeling Relevance Ranking under the Pre-training and Fine-tuning Paradigm

Figure 2 for Modeling Relevance Ranking under the Pre-training and Fine-tuning Paradigm

Figure 3 for Modeling Relevance Ranking under the Pre-training and Fine-tuning Paradigm

Figure 4 for Modeling Relevance Ranking under the Pre-training and Fine-tuning Paradigm

Abstract:Recently, pre-trained language models such as BERT have been applied to document ranking for information retrieval, which first pre-train a general language model on an unlabeled large corpus and then conduct ranking-specific fine-tuning on expert-labeled relevance datasets. Ideally, an IR system would model relevance from a user-system dualism: the user's view and the system's view. User's view judges the relevance based on the activities of "real users" while the system's view focuses on the relevance signals from the system side, e.g., from the experts or algorithms, etc. Inspired by the user-system relevance views and the success of pre-trained language models, in this paper we propose a novel ranking framework called Pre-Rank that takes both user's view and system's view into consideration, under the pre-training and fine-tuning paradigm. Specifically, to model the user's view of relevance, Pre-Rank pre-trains the initial query-document representations based on large-scale user activities data such as the click log. To model the system's view of relevance, Pre-Rank further fine-tunes the model on expert-labeled relevance data. More importantly, the pre-trained representations, are fine-tuned together with handcrafted learning-to-rank features under a wide and deep network architecture. In this way, Pre-Rank can model the relevance by incorporating the relevant knowledge and signals from both real search users and the IR experts. To verify the effectiveness of Pre-Rank, we showed two implementations by using BERT and SetRank as the underlying ranking model, respectively. Experimental results base on three publicly available benchmarks showed that in both of the implementations, Pre-Rank can respectively outperform the underlying ranking models and achieved state-of-the-art performances.

Via

Access Paper or Ask Questions

SA-MATD3:Self-attention-based multi-agent continuous control method in cooperative environments

Jul 01, 2021

Kai Liu, Yuyang Zhao, Gang Wang, Bei Peng

Figure 1 for SA-MATD3:Self-attention-based multi-agent continuous control method in cooperative environments

Figure 2 for SA-MATD3:Self-attention-based multi-agent continuous control method in cooperative environments

Figure 3 for SA-MATD3:Self-attention-based multi-agent continuous control method in cooperative environments

Figure 4 for SA-MATD3:Self-attention-based multi-agent continuous control method in cooperative environments

Abstract:Cooperative problems under continuous control have always been the focus of multi-agent reinforcement learning. Existing algorithms suffer from the problem of uneven learning degree with the increase of the number of agents. In this paper, a new structure for a multi-agent actor critic is proposed, and the self-attention mechanism is applied in the critic network and the value decomposition method used to solve the uneven problem. The proposed algorithm makes full use of the samples in the replay memory buffer to learn the behavior of a class of agents. First, a new update method is proposed for policy networks that promotes learning efficiency. Second, the utilization of samples is improved, at the same time reflecting the ability of perspective-taking among groups. Finally, the "deceptive signal" in training is eliminated and the learning degree among agents is more uniform than in the existing methods. Multiple experiments were conducted in two typical scenarios of a multi-agent particle environment. Experimental results show that the proposed algorithm can perform better than the state-of-the-art ones, and that it exhibits higher learning efficiency with an increasing number of agents.

* 30 pages

Via

Access Paper or Ask Questions

Resonant Beam Communications with Echo Interference Elimination

Jun 25, 2021

Mingliang Xiong, Qingwen Liu, Gang Wang, Georgios B. Giannakis, Sihai Zhang, Jinkang Zhu, Chuan Huang

Figure 1 for Resonant Beam Communications with Echo Interference Elimination

Figure 2 for Resonant Beam Communications with Echo Interference Elimination

Figure 3 for Resonant Beam Communications with Echo Interference Elimination

Figure 4 for Resonant Beam Communications with Echo Interference Elimination

Abstract:Resonant beam communications (RBCom) is capable of providing wide bandwidth when using light as the carrier. Besides, the RBCom system possesses the characteristics of mobility, high signal-to-noise ratio (SNR), and multiplexing. Nevertheless, the channel of the RBCom system is distinct from other light communication technologies due to the echo interference issue. In this paper, we reveal the mechanism of the echo interference and propose the method to eliminate the interference. Moreover, we present an exemplary design based on frequency shifting and optical filtering, along with its mathematic model and performance analysis. The numerical evaluation shows that the channel capacity is greater than 15 bit/s/Hz.

Via

Access Paper or Ask Questions