Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Searching for Optimal Subword Tokenization in Cross-domain NER

Jun 07, 2022

Ruotian Ma, Yiding Tan, Xin Zhou, Xuanting Chen, Di Liang, Sirui Wang, Wei Wu, Tao Gui, Qi Zhang

Figure 1 for Searching for Optimal Subword Tokenization in Cross-domain NER

Figure 2 for Searching for Optimal Subword Tokenization in Cross-domain NER

Figure 3 for Searching for Optimal Subword Tokenization in Cross-domain NER

Figure 4 for Searching for Optimal Subword Tokenization in Cross-domain NER

Share this with someone who'll enjoy it:

Abstract:Input distribution shift is one of the vital problems in unsupervised domain adaptation (UDA). The most popular UDA approaches focus on domain-invariant representation learning, trying to align the features from different domains into similar feature distributions. However, these approaches ignore the direct alignment of input word distributions between domains, which is a vital factor in word-level classification tasks such as cross-domain NER. In this work, we shed new light on cross-domain NER by introducing a subword-level solution, X-Piece, for input word-level distribution shift in NER. Specifically, we re-tokenize the input words of the source domain to approach the target subword distribution, which is formulated and solved as an optimal transport problem. As this approach focuses on the input level, it can also be combined with previous DIRL methods for further improvement. Experimental results show the effectiveness of the proposed method based on BERT-tagger on four benchmark NER datasets. Also, the proposed method is proved to benefit DIRL methods such as DANN.

* IJCAI 2022

View paper on

Share this with someone who'll enjoy it:

Title:Searching for Optimal Subword Tokenization in Cross-domain NER

Paper and Code