Multivariate time-series forecasting is a critical task for many applications, and graph time-series network is widely studied due to its capability to capture the spatial-temporal correlation simultaneously. However, most existing works focus more on learning with the explicit prior graph structure, while ignoring potential information from the implicit graph structure, yielding incomplete structure modeling. Some recent works attempt to learn the intrinsic or implicit graph structure directly while lacking a way to combine explicit prior structure with implicit structure together. In this paper, we propose Regularized Graph Structure Learning (RGSL) model to incorporate both explicit prior structure and implicit structure together, and learn the forecasting deep networks along with the graph structure. RGSL consists of two innovative modules. First, we derive an implicit dense similarity matrix through node embedding, and learn the sparse graph structure using the Regularized Graph Generation (RGG) based on the Gumbel Softmax trick. Second, we propose a Laplacian Matrix Mixed-up Module (LM3) to fuse the explicit graph and implicit graph together. We conduct experiments on three real-word datasets. Results show that the proposed RGSL model outperforms existing graph forecasting algorithms with a notable margin, while learning meaningful graph structure simultaneously. Our code and models are made publicly available at https://github.com/alipay/RGSL.git.
The prevalence and perniciousness of fake news have been a critical issue on the Internet, which stimulates the development of automatic fake news detection in turn. In this paper, we focus on evidence-based fake news detection, where several evidences are utilized to probe the veracity of news (i.e., a claim). Most previous methods first employ sequential models to embed the semantic information and then capture the claim-evidence interaction based on attention mechanisms. Despite their effectiveness, they still suffer from three weaknesses. Firstly, sequential models fail to integrate the relevant information that is scattered far apart in evidences. Secondly, they underestimate much redundant information in evidences may be useless or harmful. Thirdly, insufficient data utilization limits the separability and reliability of representations captured by the model. To solve these problems, we propose a unified Graph-based sEmantic structure mining framework with ConTRAstive Learning, namely GETRAL in short. Specifically, we first model claims and evidences as graph-structured data to capture the long-distance semantic dependency. Consequently, we reduce information redundancy by performing graph structure learning. Then the fine-grained semantic representations are fed into the claim-evidence interaction module for predictions. Finally, an adversarial contrastive learning module is applied to make full use of data and strengthen representation learning. Comprehensive experiments have demonstrated the superiority of GETRAL over the state-of-the-arts and validated the efficacy of semantic mining with graph structure and contrastive learning.
This paper introduces Honor of Kings Arena, a reinforcement learning (RL) environment based on Honor of Kings, one of the world's most popular games at present. Compared to other environments studied in most previous work, ours presents new generalization challenges for competitive reinforcement learning. It is a multi-agent problem with one agent competing against its opponent; and it requires the generalization ability as it has diverse targets to control and diverse opponents to compete with. We describe the observation, action, and reward specifications for the Honor of Kings domain and provide an open-source Python-based interface for communicating with the game engine. We provide twenty target heroes with a variety of tasks in Honor of Kings Arena and present initial baseline results for RL-based methods with feasible computing resources. Finally, we showcase the generalization challenges imposed by Honor of Kings Arena and possible remedies to the challenges. All of the software, including the environment-class, are publicly available at https://github.com/tencent-ailab/hok_env . The documentation is available at https://aiarena.tencent.com/hok/doc/ .
Freezing of gait (FoG) is one of the most common symptoms of Parkinson's disease, which is a neurodegenerative disorder of the central nervous system impacting millions of people around the world. To address the pressing need to improve the quality of treatment for FoG, devising a computer-aided detection and quantification tool for FoG has been increasingly important. As a non-invasive technique for collecting motion patterns, the footstep pressure sequences obtained from pressure sensitive gait mats provide a great opportunity for evaluating FoG in the clinic and potentially in the home environment. In this study, FoG detection is formulated as a sequential modelling task and a novel deep learning architecture, namely Adversarial Spatio-temporal Network (ASTN), is proposed to learn FoG patterns across multiple levels. A novel adversarial training scheme is introduced with a multi-level subject discriminator to obtain subject-independent FoG representations, which helps to reduce the over-fitting risk due to the high inter-subject variance. As a result, robust FoG detection can be achieved for unseen subjects. The proposed scheme also sheds light on improving subject-level clinical studies from other scenarios as it can be integrated with many existing deep architectures. To the best of our knowledge, this is one of the first studies of footstep pressure-based FoG detection and the approach of utilizing ASTN is the first deep neural network architecture in pursuit of subject-independent representations. Experimental results on 393 trials collected from 21 subjects demonstrate encouraging performance of the proposed ASTN for FoG detection with an AUC 0.85.
Deriving governing equations of complex physical systems based on first principles can be quite challenging when there are certain unknown terms and hidden physical mechanisms in the systems. In this work, we apply a deep learning architecture to learn fluid partial differential equations (PDEs) of a plasma system based on the data acquired from a fully kinetic model. The learned multi-moment fluid PDEs are demonstrated to incorporate kinetic effects such as Landau damping. Based on the learned fluid closure, the data-driven, multi-moment fluid modeling can well reproduce all the physical quantities derived from the fully kinetic model. The calculated damping rate of Landau damping is consistent with both the fully kinetic simulation and the linear theory. The data-driven fluid modeling of PDEs for complex physical systems may be applied to improve fluid closure and reduce the computational cost of multi-scale modeling of global systems.
We explore the use of knowledge distillation (KD) for learning compact and accurate models that enable classification of animal behavior from accelerometry data on wearable devices. To this end, we take a deep and complex convolutional neural network, known as residual neural network (ResNet), as the teacher model. ResNet is specifically designed for multivariate time-series classification. We use ResNet to distil the knowledge of animal behavior classification datasets into soft labels, which consist of the predicted pseudo-probabilities of every class for each datapoint. We then use the soft labels to train our significantly less complex student models, which are based on the gated recurrent unit (GRU) and multilayer perceptron (MLP). The evaluation results using two real-world animal behavior classification datasets show that the classification accuracy of the student GRU-MLP models improves appreciably through KD, approaching that of the teacher ResNet model. To further reduce the computational and memory requirements of performing inference using the student models trained via KD, we utilize dynamic fixed-point quantization through an appropriate modification of the computational graphs of the models. We implement both unquantized and quantized versions of the developed KD-based models on the embedded systems of our purpose-built collar and ear tag devices to classify animal behavior in situ and in real time. The results corroborate the effectiveness of KD and quantization in improving the inference performance in terms of both classification accuracy and computational and memory efficiency.
In most of advertising and recommendation systems, multi-task learning (MTL) paradigm is widely employed to model diverse user behaviors (e.g., click, view, and purchase). Existing MTL models typically use task-shared networks with shared parameters or a routing mechanism to learn the commonalities between tasks while applying task-specific networks to learn the unique characteristics of each task. However, the potential relevance within task-specific networks is ignored, which is intuitively crucial for overall performance. In light of the fact that relevance is both task-complex and instance-specific, we present a novel learning paradigm to address these issues. In this paper, we propose Personalized Inter-task COntrastive Learning (PICO) framework, which can effectively model the inter-task relationship and is utilized to jointly estimate the click-through rate (CTR) and post-click conversion rate (CVR) in advertising systems. PICO utilizes contrastive learning to integrate inter-task knowledge implicitly from the task representations in task-specific networks. In addition, we introduce an auxiliary network to capture the inter-task relevance at instance-level and transform it into personalized temperature parameters for contrastive learning. With this method, fine-grained knowledge can be transferred to improve MTL performance without incurring additional inference costs. Both offline and online experiments show that PICO outperforms previous multi-task models significantly.
Learning a domain-invariant representation has become one of the most popular approaches for domain adaptation/generalization. In this paper, we show that the invariant representation may not be sufficient to guarantee a good generalization, where the labeling function shift should be taken into consideration. Inspired by this, we first derive a new generalization upper bound on the empirical risk that explicitly considers the labeling function shift. We then propose Domain-specific Risk Minimization (DRM), which can model the distribution shifts of different domains separately and select the most appropriate one for the target domain. Extensive experiments on four popular domain generalization datasets, CMNIST, PACS, VLCS, and DomainNet, demonstrate the effectiveness of the proposed DRM for domain generalization with the following advantages: 1) it significantly outperforms competitive baselines; 2) it enables either comparable or superior accuracies on all training domains comparing to vanilla empirical risk minimization (ERM); 3) it remains very simple and efficient during training, and 4) it is complementary to invariant learning approaches.
Graph neural networks (GNNs) are deep learning models designed specifically for graph data, and they typically rely on node features as the input to the first layer. When applying such a type of network on the graph without node features, one can extract simple graph-based node features (e.g., number of degrees) or learn the input node representations (i.e., embeddings) when training the network. While the latter approach, which trains node embeddings, more likely leads to better performance, the number of parameters associated with the embeddings grows linearly with the number of nodes. It is therefore impractical to train the input node embeddings together with GNNs within graphics processing unit (GPU) memory in an end-to-end fashion when dealing with industrial-scale graph data. Inspired by the embedding compression methods developed for natural language processing (NLP) tasks, we develop a node embedding compression method where each node is compactly represented with a bit vector instead of a floating-point vector. The parameters utilized in the compression method can be trained together with GNNs. We show that the proposed node embedding compression method achieves superior performance compared to the alternatives.