Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Wei Zhu

Learning Bias-Invariant Representation by Cross-Sample Mutual Information Minimization

Aug 11, 2021
Wei Zhu, Haitian Zheng, Haofu Liao, Weijian Li, Jiebo Luo

Figure 1 for Learning Bias-Invariant Representation by Cross-Sample Mutual Information Minimization

Figure 2 for Learning Bias-Invariant Representation by Cross-Sample Mutual Information Minimization

Figure 3 for Learning Bias-Invariant Representation by Cross-Sample Mutual Information Minimization

Figure 4 for Learning Bias-Invariant Representation by Cross-Sample Mutual Information Minimization

Deep learning algorithms mine knowledge from the training data and thus would likely inherit the dataset's bias information. As a result, the obtained model would generalize poorly and even mislead the decision process in real-life applications. We propose to remove the bias information misused by the target task with a cross-sample adversarial debiasing (CSAD) method. CSAD explicitly extracts target and bias features disentangled from the latent representation generated by a feature extractor and then learns to discover and remove the correlation between the target and bias features. The correlation measurement plays a critical role in adversarial debiasing and is conducted by a cross-sample neural mutual information estimator. Moreover, we propose joint content and local structural representation learning to boost mutual information estimation for better performance. We conduct thorough experiments on publicly available datasets to validate the advantages of the proposed method over state-of-the-art approaches.

Via

Access Paper or Ask Questions

Lex-BERT: Enhancing BERT based NER with lexicons

Jan 02, 2021
Wei Zhu, Daniel Cheung

Figure 1 for Lex-BERT: Enhancing BERT based NER with lexicons

Figure 2 for Lex-BERT: Enhancing BERT based NER with lexicons

Figure 3 for Lex-BERT: Enhancing BERT based NER with lexicons

In this work, we represent Lex-BERT, which incorporates the lexicon information into Chinese BERT for named entity recognition (NER) tasks in a natural manner. Instead of using word embeddings and a newly designed transformer layer as in FLAT, we identify the boundary of words in the sentences using special tokens, and the modified sentence will be encoded directly by BERT. Our model does not introduce any new parameters and are more efficient than FLAT. In addition, we do not require any word embeddings accompanying the lexicon collection. Experiments on Ontonotes and ZhCrossNER show that our model outperforms FLAT and other baselines.

Via

Access Paper or Ask Questions

CMV-BERT: Contrastive multi-vocab pretraining of BERT

Dec 29, 2020
Wei Zhu, Daniel Cheung

Figure 1 for CMV-BERT: Contrastive multi-vocab pretraining of BERT

Figure 2 for CMV-BERT: Contrastive multi-vocab pretraining of BERT

In this work, we represent CMV-BERT, which improves the pretraining of a language model via two ingredients: (a) contrastive learning, which is well studied in the area of computer vision; (b) multiple vocabularies, one of which is fine-grained and the other is coarse-grained. The two methods both provide different views of an original sentence, and both are shown to be beneficial. Downstream tasks demonstrate our proposed CMV-BERT are effective in improving the pretrained language models.

Via

Access Paper or Ask Questions

MVP-BERT: Redesigning Vocabularies for Chinese BERT and Multi-Vocab Pretraining

Nov 17, 2020
Wei Zhu

Figure 1 for MVP-BERT: Redesigning Vocabularies for Chinese BERT and Multi-Vocab Pretraining

Figure 2 for MVP-BERT: Redesigning Vocabularies for Chinese BERT and Multi-Vocab Pretraining

Figure 3 for MVP-BERT: Redesigning Vocabularies for Chinese BERT and Multi-Vocab Pretraining

Figure 4 for MVP-BERT: Redesigning Vocabularies for Chinese BERT and Multi-Vocab Pretraining

Despite the development of pre-trained language models (PLMs) significantly raise the performances of various Chinese natural language processing (NLP) tasks, the vocabulary for these Chinese PLMs remain to be the one provided by Google Chinese Bert \cite{devlin2018bert}, which is based on Chinese characters. Second, the masked language model pre-training is based on a single vocabulary, which limits its downstream task performances. In this work, we first propose a novel method, \emph{seg\_tok}, to form the vocabulary of Chinese BERT, with the help of Chinese word segmentation (CWS) and subword tokenization. Then we propose three versions of multi-vocabulary pretraining (MVP) to improve the models expressiveness. Experiments show that: (a) compared with char based vocabulary, \emph{seg\_tok} does not only improves the performances of Chinese PLMs on sentence level tasks, it can also improve efficiency; (b) MVP improves PLMs' downstream performance, especially it can improve \emph{seg\_tok}'s performances on sequence labeling tasks.

Via

Access Paper or Ask Questions

Precision-Recall Curve (PRC) Classification Trees

Nov 15, 2020
Jiaju Miao, Wei Zhu

Figure 1 for Precision-Recall Curve (PRC) Classification Trees

Figure 2 for Precision-Recall Curve (PRC) Classification Trees

Figure 3 for Precision-Recall Curve (PRC) Classification Trees

Figure 4 for Precision-Recall Curve (PRC) Classification Trees

The classification of imbalanced data has presented a significant challenge for most well-known classification algorithms that were often designed for data with relatively balanced class distributions. Nevertheless skewed class distribution is a common feature in real world problems. It is especially prevalent in certain application domains with great need for machine learning and better predictive analysis such as disease diagnosis, fraud detection, bankruptcy prediction, and suspect identification. In this paper, we propose a novel tree-based algorithm based on the area under the precision-recall curve (AUPRC) for variable selection in the classification context. Our algorithm, named as the "Precision-Recall Curve classification tree", or simply the "PRC classification tree" modifies two crucial stages in tree building. The first stage is to maximize the area under the precision-recall curve in node variable selection. The second stage is to maximize the harmonic mean of recall and precision (F-measure) for threshold selection. We found the proposed PRC classification tree, and its subsequent extension, the PRC random forest, work well especially for class-imbalanced data sets. We have demonstrated that our methods outperform their classic counterparts, the usual CART and random forest for both synthetic and real data. Furthermore, the ROC classification tree proposed by our group previously has shown good performance in imbalanced data. The combination of them, the PRC-ROC tree, also shows great promise in identifying the minority class.

Via

Access Paper or Ask Questions

Predicting Parkinson's Disease with Multimodal Irregularly Collected Longitudinal Smartphone Data

Oct 15, 2020
Weijian Li, Wei Zhu, E. Ray Dorsey, Jiebo Luo

Figure 1 for Predicting Parkinson's Disease with Multimodal Irregularly Collected Longitudinal Smartphone Data

Figure 2 for Predicting Parkinson's Disease with Multimodal Irregularly Collected Longitudinal Smartphone Data

Figure 3 for Predicting Parkinson's Disease with Multimodal Irregularly Collected Longitudinal Smartphone Data

Figure 4 for Predicting Parkinson's Disease with Multimodal Irregularly Collected Longitudinal Smartphone Data

Parkinsons Disease is a neurological disorder and prevalent in elderly people. Traditional ways to diagnose the disease rely on in-person subjective clinical evaluations on the quality of a set of activity tests. The high-resolution longitudinal activity data collected by smartphone applications nowadays make it possible to conduct remote and convenient health assessment. However, out-of-lab tests often suffer from poor quality controls as well as irregularly collected observations, leading to noisy test results. To address these issues, we propose a novel time-series based approach to predicting Parkinson's Disease with raw activity test data collected by smartphones in the wild. The proposed method first synchronizes discrete activity tests into multimodal features at unified time points. Next, it distills and enriches local and global representations from noisy data across modalities and temporal observations by two attention modules. With the proposed mechanisms, our model is capable of handling noisy observations and at the same time extracting refined temporal features for improved prediction performance. Quantitative and qualitative results on a large public dataset demonstrate the effectiveness of the proposed approach.

* Accepted to ICDM-20

Via

Access Paper or Ask Questions

AutoRC: Improving BERT Based Relation Classification Models via Architecture Search

Sep 27, 2020
Wei Zhu, Xipeng Qiu, Yuan Ni, Guotong Xie

Figure 1 for AutoRC: Improving BERT Based Relation Classification Models via Architecture Search

Figure 2 for AutoRC: Improving BERT Based Relation Classification Models via Architecture Search

Figure 3 for AutoRC: Improving BERT Based Relation Classification Models via Architecture Search

Figure 4 for AutoRC: Improving BERT Based Relation Classification Models via Architecture Search

Although BERT based relation classification (RC) models have achieved significant improvements over the traditional deep learning models, it seems that no consensus can be reached on what is the optimal architecture. Firstly, there are multiple alternatives for entity span identification. Second, there are a collection of pooling operations to aggregate the representations of entities and contexts into fixed length vectors. Third, it is difficult to manually decide which feature vectors, including their interactions, are beneficial for classifying the relation types. In this work, we design a comprehensive search space for BERT based RC models and employ neural architecture search (NAS) method to automatically discover the design choices mentioned above. Experiments on seven benchmark RC tasks show that our method is efficient and effective in finding better architectures than the baseline BERT based RC model. Ablation study demonstrates the necessity of our search space design and the effectiveness of our search method.

Via

Access Paper or Ask Questions

AutoTrans: Automating Transformer Design via Reinforced Architecture Search

Sep 04, 2020
Wei Zhu, Xiaoling Wang, Xipeng Qiu, Yuan Ni, Guotong Xie

Figure 1 for AutoTrans: Automating Transformer Design via Reinforced Architecture Search

Figure 2 for AutoTrans: Automating Transformer Design via Reinforced Architecture Search

Figure 3 for AutoTrans: Automating Transformer Design via Reinforced Architecture Search

Figure 4 for AutoTrans: Automating Transformer Design via Reinforced Architecture Search

Though the transformer architectures have shown dominance in many natural language understanding tasks, there are still unsolved issues for the training of transformer models, especially the need for a principled way of warm-up which has shown importance for stable training of a transformer, as well as whether the task at hand prefer to scale the attention product or not. In this paper, we empirically explore automating the design choices in the transformer model, i.e., how to set layer-norm, whether to scale, number of layers, number of heads, activation function, etc, so that one can obtain a transformer architecture that better suits the tasks at hand. RL is employed to navigate along search space, and special parameter sharing strategies are designed to accelerate the search. It is shown that sampling a proportion of training data per epoch during search help to improve the search quality. Experiments on the CoNLL03, Multi-30k, IWSLT14 and WMT-14 shows that the searched transformer model can outperform the standard transformers. In particular, we show that our learned model can be trained more robustly with large learning rates without warm-up.

Via

Access Paper or Ask Questions

Personalized Fashion Recommendation from Personal Social Media Data: An Item-to-Set Metric Learning Approach

May 25, 2020
Haitian Zheng, Kefei Wu, Jong-Hwi Park, Wei Zhu, Jiebo Luo

Figure 1 for Personalized Fashion Recommendation from Personal Social Media Data: An Item-to-Set Metric Learning Approach

Figure 2 for Personalized Fashion Recommendation from Personal Social Media Data: An Item-to-Set Metric Learning Approach

Figure 3 for Personalized Fashion Recommendation from Personal Social Media Data: An Item-to-Set Metric Learning Approach

Figure 4 for Personalized Fashion Recommendation from Personal Social Media Data: An Item-to-Set Metric Learning Approach

With the growth of online shopping for fashion products, accurate fashion recommendation has become a critical problem. Meanwhile, social networks provide an open and new data source for personalized fashion analysis. In this work, we study the problem of personalized fashion recommendation from social media data, i.e. recommending new outfits to social media users that fit their fashion preferences. To this end, we present an item-to-set metric learning framework that learns to compute the similarity between a set of historical fashion items of a user to a new fashion item. To extract features from multi-modal street-view fashion items, we propose an embedding module that performs multi-modality feature extraction and cross-modality gated fusion. To validate the effectiveness of our approach, we collect a real-world social media dataset. Extensive experiments on the collected dataset show the superior performance of our proposed approach.

* 9 pages, 7 figures

Via

Access Paper or Ask Questions