Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Xiao Chen

TinyBERT: Distilling BERT for Natural Language Understanding

Sep 24, 2019
Xiaoqi Jiao, Yichun Yin, Lifeng Shang, Xin Jiang, Xiao Chen, Linlin Li, Fang Wang, Qun Liu

Figure 1 for TinyBERT: Distilling BERT for Natural Language Understanding

Figure 2 for TinyBERT: Distilling BERT for Natural Language Understanding

Figure 3 for TinyBERT: Distilling BERT for Natural Language Understanding

Figure 4 for TinyBERT: Distilling BERT for Natural Language Understanding

Language model pre-training, such as BERT, has significantly improved the performances of many natural language processing tasks. However, pre-trained language models are usually computationally expensive and memory intensive, so it is difficult to effectively execute them on some resource-restricted devices. To accelerate inference and reduce model size while maintaining accuracy, we firstly propose a novel transformer distillation method that is a specially designed knowledge distillation (KD) method for transformer-based models. By leveraging this new KD method, the plenty of knowledge encoded in a large teacher BERT can be well transferred to a small student TinyBERT. Moreover, we introduce a new two-stage learning framework for TinyBERT, which performs transformer distillation at both the pre-training and task-specific learning stages. This framework ensures that TinyBERT can capture both the general-domain and task-specific knowledge of the teacher BERT. TinyBERT is empirically effective and achieves comparable results with BERT in GLUE datasets, while being 7.5x smaller and 9.4x faster on inference. TinyBERT is also significantly better than state-of-the-art baselines, even with only about 28% parameters and 31% inference time of baselines.

* 13 pages, 2 figures, 9 tables

Via

Access Paper or Ask Questions

NEZHA: Neural Contextualized Representation for Chinese Language Understanding

Sep 05, 2019
Junqiu Wei, Xiaozhe Ren, Xiaoguang Li, Wenyong Huang, Yi Liao, Yasheng Wang, Jiashu Lin, Xin Jiang, Xiao Chen, Qun Liu

Figure 1 for NEZHA: Neural Contextualized Representation for Chinese Language Understanding

Figure 2 for NEZHA: Neural Contextualized Representation for Chinese Language Understanding

Figure 3 for NEZHA: Neural Contextualized Representation for Chinese Language Understanding

Figure 4 for NEZHA: Neural Contextualized Representation for Chinese Language Understanding

The pre-trained language models have achieved great successes in various natural language understanding (NLU) tasks due to its capacity to capture the deep contextualized information in text by pre-training on large-scale corpora. In this technical report, we present our practice of pre-training language models named NEZHA (NEural contextualiZed representation for CHinese lAnguage understanding) on Chinese corpora and finetuning for the Chinese NLU tasks. The current version of NEZHA is based on BERT with a collection of proven improvements, which include Functional Relative Positional Encoding as an effective positional encoding scheme, Whole Word Masking strategy, Mixed Precision Training and the LAMB Optimizer in training the models. The experimental results show that NEZHA achieves the state-of-the-art performances when finetuned on several representative Chinese tasks, including named entity recognition (People's Daily NER), sentence matching (LCQMC), Chinese sentiment classification (ChnSenti) and natural language inference (XNLI).

Via

Access Paper or Ask Questions

Dialog State Tracking with Reinforced Data Augmentation

Aug 21, 2019
Yichun Yin, Lifeng Shang, Xin Jiang, Xiao Chen, Qun Liu

Figure 1 for Dialog State Tracking with Reinforced Data Augmentation

Figure 2 for Dialog State Tracking with Reinforced Data Augmentation

Figure 3 for Dialog State Tracking with Reinforced Data Augmentation

Figure 4 for Dialog State Tracking with Reinforced Data Augmentation

Neural dialog state trackers are generally limited due to the lack of quantity and diversity of annotated training data. In this paper, we address this difficulty by proposing a reinforcement learning (RL) based framework for data augmentation that can generate high-quality data to improve the neural state tracker. Specifically, we introduce a novel contextual bandit generator to learn fine-grained augmentation policies that can generate new effective instances by choosing suitable replacements for the specific context. Moreover, by alternately learning between the generator and the state tracker, we can keep refining the generative policies to generate more high-quality training data for neural state tracker. Experimental results on the WoZ and MultiWoZ (restaurant) datasets demonstrate that the proposed framework significantly improves the performance over the state-of-the-art models, especially with limited training data.

* Under review

Via

Access Paper or Ask Questions

Understanding Video Content: Efficient Hero Detection and Recognition for the Game "Honor of Kings"

Jul 18, 2019
Wentao Yao, Zixun Sun, Xiao Chen

Figure 1 for Understanding Video Content: Efficient Hero Detection and Recognition for the Game "Honor of Kings"

Figure 2 for Understanding Video Content: Efficient Hero Detection and Recognition for the Game "Honor of Kings"

Figure 3 for Understanding Video Content: Efficient Hero Detection and Recognition for the Game "Honor of Kings"

Figure 4 for Understanding Video Content: Efficient Hero Detection and Recognition for the Game "Honor of Kings"

In order to understand content and automatically extract labels for videos of the game "Honor of Kings", it is necessary to detect and recognize characters (called "hero") together with their camps in the game video. In this paper, we propose an efficient two-stage algorithm to detect and recognize heros in game videos. First, we detect all heros in a video frame based on blood bar template-matching method, and classify them according to their camps (self/ friend/ enemy). Then we recognize the name of each hero using one or more deep convolution neural networks. Our method needs almost no work for labelling training and testing samples in the recognition stage. Experiments show its efficiency and accuracy in the task of hero detection and recognition in game videos.

Via

Access Paper or Ask Questions

News Cover Assessment via Multi-task Learning

Jul 18, 2019
Zixun Sun, Shuang Zhao, Chengwei Zhu, Xiao Chen

Figure 1 for News Cover Assessment via Multi-task Learning

Figure 2 for News Cover Assessment via Multi-task Learning

Figure 3 for News Cover Assessment via Multi-task Learning

Figure 4 for News Cover Assessment via Multi-task Learning

Online personalized news product needs a suitable cover for the article. The news cover demands to be with high image quality, and draw readers' attention at same time, which is extraordinary challenging due to the subjectivity of the task. In this paper, we assess the news cover from image clarity and object salience perspective. We propose an end-to-end multi-task learning network for image clarity assessment and semantic segmentation simultaneously, the results of which can be guided for news cover assessment. The proposed network is based on a modified DeepLabv3+ model. The network backbone is used for multiple scale spatial features exaction, followed by two branches for image clarity assessment and semantic segmentation, respectively. The experiment results show that the proposed model is able to capture important content in images and performs better than single-task learning baselines on our proposed game content based CIA dataset.

* 6 pages, 9 figures

Via

Access Paper or Ask Questions

Modeling Semantic Compositionality with Sememe Knowledge

Jul 10, 2019
Fanchao Qi, Junjie Huang, Chenghao Yang, Zhiyuan Liu, Xiao Chen, Qun Liu, Maosong Sun

Figure 1 for Modeling Semantic Compositionality with Sememe Knowledge

Figure 2 for Modeling Semantic Compositionality with Sememe Knowledge

Figure 3 for Modeling Semantic Compositionality with Sememe Knowledge

Figure 4 for Modeling Semantic Compositionality with Sememe Knowledge

Semantic compositionality (SC) refers to the phenomenon that the meaning of a complex linguistic unit can be composed of the meanings of its constituents. Most related works focus on using complicated compositionality functions to model SC while few works consider external knowledge in models. In this paper, we verify the effectiveness of sememes, the minimum semantic units of human languages, in modeling SC by a confirmatory experiment. Furthermore, we make the first attempt to incorporate sememe knowledge into SC models, and employ the sememeincorporated models in learning representations of multiword expressions, a typical task of SC. In experiments, we implement our models by incorporating knowledge from a famous sememe knowledge base HowNet and perform both intrinsic and extrinsic evaluations. Experimental results show that our models achieve significant performance boost as compared to the baseline methods without considering sememe knowledge. We further conduct quantitative analysis and case studies to demonstrate the effectiveness of applying sememe knowledge in modeling SC. All the code and data of this paper can be obtained on https://github.com/thunlp/Sememe-SC.

* To appear at ACL 2019

Via

Access Paper or Ask Questions

Relaxed 2-D Principal Component Analysis by $L_p$ Norm for Face Recognition

May 15, 2019
Xiao Chen, Zhi-Gang Jia, Yunfeng Cai, Mei-Xiang Zhao

Figure 1 for Relaxed 2-D Principal Component Analysis by $L_p$ Norm for Face Recognition

Figure 2 for Relaxed 2-D Principal Component Analysis by $L_p$ Norm for Face Recognition

Figure 3 for Relaxed 2-D Principal Component Analysis by $L_p$ Norm for Face Recognition

Figure 4 for Relaxed 2-D Principal Component Analysis by $L_p$ Norm for Face Recognition

A relaxed two dimensional principal component analysis (R2DPCA) approach is proposed for face recognition. Different to the 2DPCA, 2DPCA-$L_1$ and G2DPCA, the R2DPCA utilizes the label information (if known) of training samples to calculate a relaxation vector and presents a weight to each subset of training data. A new relaxed scatter matrix is defined and the computed projection axes are able to increase the accuracy of face recognition. The optimal $L_p$-norms are selected in a reasonable range. Numerical experiments on practical face databased indicate that the R2DPCA has high generalization ability and can achieve a higher recognition rate than state-of-the-art methods.

* 19 pages, 11 figures

Via

Access Paper or Ask Questions

Distributed generation of privacy preserving data with user customization

Apr 20, 2019
Xiao Chen, Thomas Navidi, Stefano Ermon, Ram Rajagopal

Figure 1 for Distributed generation of privacy preserving data with user customization

Figure 2 for Distributed generation of privacy preserving data with user customization

Figure 3 for Distributed generation of privacy preserving data with user customization

Figure 4 for Distributed generation of privacy preserving data with user customization

Distributed devices such as mobile phones can produce and store large amounts of data that can enhance machine learning models; however, this data may contain private information specific to the data owner that prevents the release of the data. We wish to reduce the correlation between user-specific private information and data while maintaining the useful information. Rather than learning a large model to achieve privatization from end to end, we introduce a decoupling of the creation of a latent representation and the privatization of data that allows user-specific privatization to occur in a distributed setting with limited computation and minimal disturbance on the utility of the data. We leverage a Variational Autoencoder (VAE) to create a compact latent representation of the data; however, the VAE remains fixed for all devices and all possible private labels. We then train a small generative filter to perturb the latent representation based on individual preferences regarding the private and utility information. The small filter is trained by utilizing a GAN-type robust optimization that can take place on a distributed device. We conduct experiments on three popular datasets: MNIST, UCI-Adult, and CelebA, and give a thorough evaluation including visualizing the geometry of the latent embeddings and estimating the empirical mutual information to show the effectiveness of our approach.

* accepted in ICLR 2019 SafeML workshop

Via

Access Paper or Ask Questions