Alert button
Picture for Tianyi Zhou

Tianyi Zhou

Alert button

Multi-Center Federated Learning

May 03, 2020
Ming Xie, Guodong Long, Tao Shen, Tianyi Zhou, Xianzhi Wang, Jing Jiang

Figure 1 for Multi-Center Federated Learning
Figure 2 for Multi-Center Federated Learning
Figure 3 for Multi-Center Federated Learning
Figure 4 for Multi-Center Federated Learning

Federated learning has received great attention for its capability to train a large-scale model in a decentralized manner without needing to access user data directly. It helps protect the users' private data from centralized collecting. Unlike distributed machine learning, federated learning aims to tackle non-IID data from heterogeneous sources in various real-world applications, such as those on smartphones. Existing federated learning approaches usually adopt a single global model to capture the shared knowledge of all users by aggregating their gradients, regardless of the discrepancy between their data distributions. However, due to the diverse nature of user behaviors, assigning users' gradients to different global models (i.e., centers) can better capture the heterogeneity of data distributions across users. Our paper proposes a novel multi-center aggregation mechanism for federated learning, which learns multiple global models from the non-IID user data and simultaneously derives the optimal matching between users and centers. We formulate the problem as a joint optimization that can be efficiently solved by a stochastic expectation maximization (EM) algorithm. Our experimental results on benchmark datasets show that our method outperforms several popular federated learning methods.

Viaarxiv icon

Semantic Triple Encoder for Fast Open-Set Link Prediction

Apr 30, 2020
Bo Wang, Tao Shen, Guodong Long, Tianyi Zhou, Yi Chang

Figure 1 for Semantic Triple Encoder for Fast Open-Set Link Prediction
Figure 2 for Semantic Triple Encoder for Fast Open-Set Link Prediction
Figure 3 for Semantic Triple Encoder for Fast Open-Set Link Prediction
Figure 4 for Semantic Triple Encoder for Fast Open-Set Link Prediction

We improve both the open-set generalization and efficiency of link prediction on knowledge graphs by leveraging the contexts of entities and relations in a novel semantic triple encoder. Most previous methods, e.g., translation-based and GCN-based embedding approaches, were built upon graph embedding models. They simply treat the entities/relations as a closed set of graph nodes regardless of their context semantics, which however cannot provide critical information for the generalization to unseen entities/relations. In this paper, we partition each graph triple and develop a novel context-based encoder that separately maps each part and its context into a latent semantic space. We train this semantic triple encoder by optimizing two objectives specifically designed for link prediction. In particular, (1) We split each triple into two parts, i.e., i) head entity plus relation and ii) tail entity, process both contexts separately by a Transformer encoder, and combine the encoding outputs to derive the prediction. This Siamese-like architecture avoids the combinatorial explosion of candidate triples and significantly improves the efficiency, especially during inference; (2) We cover the contextualized semantics of the triples in the encoder so it can handle unseen entities during inference, which promisingly improves the generalization ability; (3) We train the model by optimizing two complementary objectives defined on the triple, i.e., classification and contrastive losses, for natural and reliable ranking scores during inference. In experiments, we achieve the state-of-the-art or competitive performance on three popular link prediction benchmarks. In addition, we empirically reduce the inference costs by one or two orders of magnitude compared to a recent context-based encoding approach and meanwhile keep a superior quality of prediction.

Viaarxiv icon

Rethinking 1D-CNN for Time Series Classification: A Stronger Baseline

Feb 24, 2020
Wensi Tang, Guodong Long, Lu Liu, Tianyi Zhou, Jing Jiang, Michael Blumenstein

Figure 1 for Rethinking 1D-CNN for Time Series Classification: A Stronger Baseline
Figure 2 for Rethinking 1D-CNN for Time Series Classification: A Stronger Baseline
Figure 3 for Rethinking 1D-CNN for Time Series Classification: A Stronger Baseline
Figure 4 for Rethinking 1D-CNN for Time Series Classification: A Stronger Baseline

For time series classification task using 1D-CNN, the selection of kernel size is critically important to ensure the model can capture the right scale salient signal from a long time-series. Most of the existing work on 1D-CNN treats the kernel size as a hyper-parameter and tries to find the proper kernel size through a grid search which is time-consuming and is inefficient. This paper theoretically analyses how kernel size impacts the performance of 1D-CNN. Considering the importance of kernel size, we propose a novel Omni-Scale 1D-CNN (OS-CNN) architecture to capture the proper kernel size during the model learning period. A specific design for kernel size configuration is developed which enables us to assemble very few kernel-size options to represent more receptive fields. The proposed OS-CNN method is evaluated using the UCR archive with 85 datasets. The experiment results demonstrate that our method is a stronger baseline in multiple performance indicators, including the critical difference diagram, counts of wins, and average accuracy. We also published the experimental source codes at GitHub (https://github.com/Wensi-Tang/OS-CNN/).

Viaarxiv icon

Conditional Self-Attention for Query-based Summarization

Feb 18, 2020
Yujia Xie, Tianyi Zhou, Yi Mao, Weizhu Chen

Figure 1 for Conditional Self-Attention for Query-based Summarization
Figure 2 for Conditional Self-Attention for Query-based Summarization
Figure 3 for Conditional Self-Attention for Query-based Summarization
Figure 4 for Conditional Self-Attention for Query-based Summarization

Self-attention mechanisms have achieved great success on a variety of NLP tasks due to its flexibility of capturing dependency between arbitrary positions in a sequence. For problems such as query-based summarization (Qsumm) and knowledge graph reasoning where each input sequence is associated with an extra query, explicitly modeling such conditional contextual dependencies can lead to a more accurate solution, which however cannot be captured by existing self-attention mechanisms. In this paper, we propose \textit{conditional self-attention} (CSA), a neural network module designed for conditional dependency modeling. CSA works by adjusting the pairwise attention between input tokens in a self-attention module with the matching score of the inputs to the given query. Thereby, the contextual dependencies modeled by CSA will be highly relevant to the query. We further studied variants of CSA defined by different types of attention. Experiments on Debatepedia and HotpotQA benchmark datasets show CSA consistently outperforms vanilla Transformer and previous models for the Qsumm problem.

Viaarxiv icon

Collaborative Inference for Efficient Remote Monitoring

Feb 12, 2020
Chi Zhang, Yong Sheng Soh, Ling Feng, Tianyi Zhou, Qianxiao Li

Figure 1 for Collaborative Inference for Efficient Remote Monitoring
Figure 2 for Collaborative Inference for Efficient Remote Monitoring
Figure 3 for Collaborative Inference for Efficient Remote Monitoring
Figure 4 for Collaborative Inference for Efficient Remote Monitoring

While current machine learning models have impressive performance over a wide range of applications, their large size and complexity render them unsuitable for tasks such as remote monitoring on edge devices with limited storage and computational power. A naive approach to resolve this on the model level is to use simpler architectures, but this sacrifices prediction accuracy and is unsuitable for monitoring applications requiring accurate detection of the onset of adverse events. In this paper, we propose an alternative solution to this problem by decomposing the predictive model as the sum of a simple function which serves as a local monitoring tool, and a complex correction term to be evaluated on the server. A sign requirement is imposed on the latter to ensure that the local monitoring function is safe, in the sense that it can effectively serve as an early warning system. Our analysis quantifies the trade-offs between model complexity and performance, and serves as a guidance for architecture design. We validate our proposed framework on a series of monitoring experiments, where we succeed at learning monitoring models with significantly reduced complexity that minimally violate the safety requirement. More broadly, our framework is useful for learning classifiers in applications where false negatives are significantly more costly compared to false positives.

Viaarxiv icon

Self-Attention Enhanced Selective Gate with Entity-Aware Embedding for Distantly Supervised Relation Extraction

Nov 27, 2019
Yang Li, Guodong Long, Tao Shen, Tianyi Zhou, Lina Yao, Huan Huo, Jing Jiang

Figure 1 for Self-Attention Enhanced Selective Gate with Entity-Aware Embedding for Distantly Supervised Relation Extraction
Figure 2 for Self-Attention Enhanced Selective Gate with Entity-Aware Embedding for Distantly Supervised Relation Extraction
Figure 3 for Self-Attention Enhanced Selective Gate with Entity-Aware Embedding for Distantly Supervised Relation Extraction
Figure 4 for Self-Attention Enhanced Selective Gate with Entity-Aware Embedding for Distantly Supervised Relation Extraction

Distantly supervised relation extraction intrinsically suffers from noisy labels due to the strong assumption of distant supervision. Most prior works adopt a selective attention mechanism over sentences in a bag to denoise from wrongly labeled data, which however could be incompetent when there is only one sentence in a bag. In this paper, we propose a brand-new light-weight neural framework to address the distantly supervised relation extraction problem and alleviate the defects in previous selective attention framework. Specifically, in the proposed framework, 1) we use an entity-aware word embedding method to integrate both relative position information and head/tail entity embeddings, aiming to highlight the essence of entities for this task; 2) we develop a self-attention mechanism to capture the rich contextual dependencies as a complement for local dependencies captured by piecewise CNN; and 3) instead of using selective attention, we design a pooling-equipped gate, which is based on rich contextual representations, as an aggregator to generate bag-level representation for final relation classification. Compared to selective attention, one major advantage of the proposed gating mechanism is that, it performs stably and promisingly even if only one sentence appears in a bag and thus keeps the consistency across all training examples. The experiments on NYT dataset demonstrate that our approach achieves a new state-of-the-art performance in terms of both AUC and top-n precision metrics.

* Accepted to appear at AAAI 2020 
Viaarxiv icon

Learning to Propagate for Graph Meta-Learning

Sep 11, 2019
Lu Liu, Tianyi Zhou, Guodong Long, Jing Jiang, Chengqi Zhang

Figure 1 for Learning to Propagate for Graph Meta-Learning
Figure 2 for Learning to Propagate for Graph Meta-Learning
Figure 3 for Learning to Propagate for Graph Meta-Learning
Figure 4 for Learning to Propagate for Graph Meta-Learning

Meta-learning extracts the common knowledge acquired from learning different tasks and uses it for unseen tasks. It demonstrates a clear advantage on tasks that have insufficient training data, e.g., few-shot learning. In most meta-learning methods, tasks are implicitly related via the shared model or optimizer. In this paper, we show that a meta-learner that explicitly relates tasks on a graph describing the relations of their output dimensions (e.g., classes) can significantly improve the performance of few-shot learning. This type of graph is usually free or cheap to obtain but has rarely been explored in previous works. We study the prototype based few-shot classification, in which a prototype is generated for each class, such that the nearest neighbor search between the prototypes produces an accurate classification. We introduce "Gated Propagation Network (GPN)", which learns to propagate messages between prototypes of different classes on the graph, so that learning the prototype of each class benefits from the data of other related classes. In GPN, an attention mechanism is used for the aggregation of messages from neighboring classes, and a gate is deployed to choose between the aggregated messages and the message from the class itself. GPN is trained on a sequence of tasks from many-shot to few-shot generated by subgraph sampling. During training, it is able to reuse and update previously achieved prototypes from the memory in a life-long learning cycle. In experiments, we change the training-test discrepancy and test task generation settings for thorough evaluations. GPN outperforms recent meta-learning methods on two benchmark datasets in all studied cases.

* Accepted to NeurIPS 2019 
Viaarxiv icon

Prototype Propagation Networks (PPN) for Weakly-supervised Few-shot Learning on Category Graph

Jun 02, 2019
Lu Liu, Tianyi Zhou, Guodong Long, Jing Jiang, Lina Yao, Chengqi Zhang

Figure 1 for Prototype Propagation Networks (PPN) for Weakly-supervised Few-shot Learning on Category Graph
Figure 2 for Prototype Propagation Networks (PPN) for Weakly-supervised Few-shot Learning on Category Graph
Figure 3 for Prototype Propagation Networks (PPN) for Weakly-supervised Few-shot Learning on Category Graph
Figure 4 for Prototype Propagation Networks (PPN) for Weakly-supervised Few-shot Learning on Category Graph

A variety of machine learning applications expect to achieve rapid learning from a limited number of labeled data. However, the success of most current models is the result of heavy training on big data. Meta-learning addresses this problem by extracting common knowledge across different tasks that can be quickly adapted to new tasks. However, they do not fully explore weakly-supervised information, which is usually free or cheap to collect. In this paper, we show that weakly-labeled data can significantly improve the performance of meta-learning on few-shot classification. We propose prototype propagation network (PPN) trained on few-shot tasks together with data annotated by coarse-label. Given a category graph of the targeted fine-classes and some weakly-labeled coarse-classes, PPN learns an attention mechanism which propagates the prototype of one class to another on the graph, so that the K-nearest neighbor (KNN) classifier defined on the propagated prototypes results in high accuracy across different few-shot tasks. The training tasks are generated by subgraph sampling, and the training objective is obtained by accumulating the level-wise classification loss on the subgraph. The resulting graph of prototypes can be continually re-used and updated for new tasks and classes. We also introduce two practical test/inference settings which differ according to whether the test task can leverage any weakly-supervised information as in training. On two benchmarks, PPN significantly outperforms most recent few-shot learning methods in different settings, even when they are also allowed to train on weakly-labeled data.

* Accepted to IJCAI 2019, Code is publicly available at: https://github.com/liulu112601/PPN 
Viaarxiv icon

Fast Directional Self-Attention Mechanism

Sep 09, 2018
Tao Shen, Tianyi Zhou, Guodong Long, Jing Jiang, Chengqi Zhang

Figure 1 for Fast Directional Self-Attention Mechanism
Figure 2 for Fast Directional Self-Attention Mechanism
Figure 3 for Fast Directional Self-Attention Mechanism
Figure 4 for Fast Directional Self-Attention Mechanism

In this paper, we propose a self-attention mechanism, dubbed "fast directional self-attention (Fast-DiSA)", which is a fast and light extension of "directional self-attention (DiSA)". The proposed Fast-DiSA performs as expressively as the original DiSA but only uses much less computation time and memory, in which 1) both token2token and source2token dependencies are modeled by a joint compatibility function designed for a hybrid of both dot-product and multi-dim ways; 2) both multi-head and multi-dim attention combined with bi-directional temporal information captured by multiple positional masks are in consideration without heavy time and memory consumption appearing in the DiSA. The experiment results show that the proposed Fast-DiSA can achieve state-of-the-art performance as fast and memory-friendly as CNNs. The code for Fast-DiSA is released at \url{https://github.com/taoshen58/DiSAN/tree/master/Fast-DiSA}.

* 9 pages, 2 figures 
Viaarxiv icon