Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Duyu Tang

Effidit: Your AI Writing Assistant

Aug 04, 2022

Shuming Shi, Enbo Zhao, Duyu Tang, Yan Wang, Piji Li, Wei Bi, Haiyun Jiang, Guoping Huang, Leyang Cui, Xinting Huang(+3 more)

Figure 1 for Effidit: Your AI Writing Assistant

Figure 2 for Effidit: Your AI Writing Assistant

Figure 3 for Effidit: Your AI Writing Assistant

Figure 4 for Effidit: Your AI Writing Assistant

Abstract:In this technical report, we introduce Effidit (Efficient and Intelligent Editing), a digital writing assistant that facilitates users to write higher-quality text more efficiently by using artificial intelligence (AI) technologies. Previous writing assistants typically provide the function of error checking (to detect and correct spelling and grammatical errors) and limited text-rewriting functionality. With the emergence of large-scale neural language models, some systems support automatically completing a sentence or a paragraph. In Effidit, we significantly expand the capacities of a writing assistant by providing functions in five categories: text completion, error checking, text polishing, keywords to sentences (K2S), and cloud input methods (cloud IME). In the text completion category, Effidit supports generation-based sentence completion, retrieval-based sentence completion, and phrase completion. In contrast, many other writing assistants so far only provide one or two of the three functions. For text polishing, we have three functions: (context-aware) phrase polishing, sentence paraphrasing, and sentence expansion, whereas many other writing assistants often support one or two functions in this category. The main contents of this report include major modules of Effidit, methods for implementing these modules, and evaluation results of some key methods.

* Technical report for Effidit. arXiv admin note: text overlap with arXiv:2202.06417

Via

Access Paper or Ask Questions

One Model, Multiple Modalities: A Sparsely Activated Approach for Text, Sound, Image, Video and Code

May 12, 2022

Yong Dai, Duyu Tang, Liangxin Liu, Minghuan Tan, Cong Zhou, Jingquan Wang, Zhangyin Feng, Fan Zhang, Xueyu Hu, Shuming Shi

Figure 1 for One Model, Multiple Modalities: A Sparsely Activated Approach for Text, Sound, Image, Video and Code

Figure 2 for One Model, Multiple Modalities: A Sparsely Activated Approach for Text, Sound, Image, Video and Code

Figure 3 for One Model, Multiple Modalities: A Sparsely Activated Approach for Text, Sound, Image, Video and Code

Figure 4 for One Model, Multiple Modalities: A Sparsely Activated Approach for Text, Sound, Image, Video and Code

Abstract:People perceive the world with multiple senses (e.g., through hearing sounds, reading words and seeing objects). However, most existing AI systems only process an individual modality. This paper presents an approach that excels at handling multiple modalities of information with a single model. In our "{SkillNet}" model, different parts of the parameters are specialized for processing different modalities. Unlike traditional dense models that always activate all the model parameters, our model sparsely activates parts of the parameters whose skills are relevant to the task. Such model design enables SkillNet to learn skills in a more interpretable way. We develop our model for five modalities including text, image, sound, video and code. Results show that, SkillNet performs comparably to five modality-specific fine-tuned models. Moreover, our model supports self-supervised pretraining with the same sparsely activated way, resulting in better initialized parameters for different modalities. We find that pretraining significantly improves the performance of SkillNet on five modalities, on par with or even better than baselines with modality-specific pretraining. On the task of Chinese text-to-image retrieval, our final system achieves higher accuracy than existing leading systems including Wukong{ViT-B} and Wenlan 2.0 while using less number of activated parameters.

Via

Access Paper or Ask Questions

SkillNet-NLG: General-Purpose Natural Language Generation with a Sparsely Activated Approach

Apr 26, 2022

Junwei Liao, Duyu Tang, Fan Zhang, Shuming Shi

Abstract:We present SkillNet-NLG, a sparsely activated approach that handles many natural language generation tasks with one model. Different from traditional dense models that always activate all the parameters, SkillNet-NLG selectively activates relevant parts of the parameters to accomplish a task, where the relevance is controlled by a set of predefined skills. The strength of such model design is that it provides an opportunity to precisely adapt relevant skills to learn new tasks effectively. We evaluate on Chinese natural language generation tasks. Results show that, with only one model file, SkillNet-NLG outperforms previous best performance methods on four of five tasks. SkillNet-NLG performs better than two multi-task learning baselines (a dense model and a Mixture-of-Expert model) and achieves comparable performance to task-specific models. Lastly, SkillNet-NLG surpasses baseline systems when being adapted to new tasks.

* 8 pages,3 figures

Via

Access Paper or Ask Questions

Pretraining Chinese BERT for Detecting Word Insertion and Deletion Errors

Apr 26, 2022

Cong Zhou, Yong Dai, Duyu Tang, Enbo Zhao, Zhangyin Feng, Li Kuang, Shuming Shi

Figure 1 for Pretraining Chinese BERT for Detecting Word Insertion and Deletion Errors

Figure 2 for Pretraining Chinese BERT for Detecting Word Insertion and Deletion Errors

Figure 3 for Pretraining Chinese BERT for Detecting Word Insertion and Deletion Errors

Figure 4 for Pretraining Chinese BERT for Detecting Word Insertion and Deletion Errors

Abstract:Chinese BERT models achieve remarkable progress in dealing with grammatical errors of word substitution. However, they fail to handle word insertion and deletion because BERT assumes the existence of a word at each position. To address this, we present a simple and effective Chinese pretrained model. The basic idea is to enable the model to determine whether a word exists at a particular position. We achieve this by introducing a special token \texttt{[null]}, the prediction of which stands for the non-existence of a word. In the training stage, we design pretraining tasks such that the model learns to predict \texttt{[null]} and real words jointly given the surrounding context. In the inference stage, the model readily detects whether a word should be inserted or deleted with the standard masked language modeling function. We further create an evaluation dataset to foster research on word insertion and deletion. It includes human-annotated corrections for 7,726 erroneous sentences. Results show that existing Chinese BERT performs poorly on detecting insertion and deletion errors. Our approach significantly improves the F1 scores from 24.1\% to 78.1\% for word insertion and from 26.5\% to 68.5\% for word deletion, respectively.

* 12 pages

Via

Access Paper or Ask Questions

MarkBERT: Marking Word Boundaries Improves Chinese BERT

Mar 12, 2022

Linyang Li, Yong Dai, Duyu Tang, Zhangyin Feng, Cong Zhou, Xipeng Qiu, Zenglin Xu, Shuming Shi

Figure 1 for MarkBERT: Marking Word Boundaries Improves Chinese BERT

Figure 2 for MarkBERT: Marking Word Boundaries Improves Chinese BERT

Figure 3 for MarkBERT: Marking Word Boundaries Improves Chinese BERT

Figure 4 for MarkBERT: Marking Word Boundaries Improves Chinese BERT

Abstract:We present a Chinese BERT model dubbed MarkBERT that uses word information. Existing word-based BERT models regard words as basic units, however, due to the vocabulary limit of BERT, they only cover high-frequency words and fall back to character level when encountering out-of-vocabulary (OOV) words. Different from existing works, MarkBERT keeps the vocabulary being Chinese characters and inserts boundary markers between contiguous words. Such design enables the model to handle any words in the same way, no matter they are OOV words or not. Besides, our model has two additional benefits: first, it is convenient to add word-level learning objectives over markers, which is complementary to traditional character and sentence-level pre-training tasks; second, it can easily incorporate richer semantics such as POS tags of words by replacing generic markers with POS tag-specific markers. MarkBERT pushes the state-of-the-art of Chinese named entity recognition from 95.4\% to 96.5\% on the MSRA dataset and from 82.8\% to 84.2\% on the OntoNotes dataset, respectively. Compared to previous word-based BERT models, MarkBERT achieves better accuracy on text classification, keyword recognition, and semantic similarity tasks.

* Work in progress

Via

Access Paper or Ask Questions

One Model, Multiple Tasks: Pathways for Natural Language Understanding

Mar 07, 2022

Duyu Tang, Fan Zhang, Yong Dai, Cong Zhou, Shuangzhi Wu, Shuming Shi

Figure 1 for One Model, Multiple Tasks: Pathways for Natural Language Understanding

Figure 2 for One Model, Multiple Tasks: Pathways for Natural Language Understanding

Figure 3 for One Model, Multiple Tasks: Pathways for Natural Language Understanding

Figure 4 for One Model, Multiple Tasks: Pathways for Natural Language Understanding

Abstract:This paper presents a Pathways approach to handle many tasks at once. Our approach is general-purpose and sparse. Unlike prevailing single-purpose models that overspecialize at individual tasks and learn from scratch when being extended to new tasks, our approach is general-purpose with the ability of stitching together existing skills to learn new tasks more effectively. Different from traditional dense models that always activate all the model parameters, our approach is sparsely activated: only relevant parts of the model (like pathways through the network) are activated. We take natural language understanding as a case study and define a set of skills like \textit{the skill of understanding the sentiment of text} and \textit{the skill of understanding natural language questions}. These skills can be reused and combined to support many different tasks and situations. We develop our system using Transformer as the backbone. For each skill, we implement skill-specific feed-forward networks, which are activated only if the skill is relevant to the task. An appealing feature of our model is that it not only supports sparsely activated fine-tuning, but also allows us to pretrain skills in the same sparse way with masked language modeling and next sentence prediction. We call this model \textbf{SkillNet}. We have three major findings. First, with only one model checkpoint, SkillNet performs better than task-specific fine-tuning and two multi-task learning baselines (i.e., dense model and Mixture-of-Experts model) on six tasks. Second, sparsely activated pre-training further improves the overall performance. Third, SkillNet significantly outperforms baseline systems when being extended to new tasks.

Via

Access Paper or Ask Questions

"Is Whole Word Masking Always Better for Chinese BERT?": Probing on Chinese Grammatical Error Correction

Mar 02, 2022

Yong Dai, Linyang Li, Cong Zhou, Zhangyin Feng, Enbo Zhao, Xipeng Qiu, Piji Li, Duyu Tang

Figure 1 for "Is Whole Word Masking Always Better for Chinese BERT?": Probing on Chinese Grammatical Error Correction

Figure 2 for "Is Whole Word Masking Always Better for Chinese BERT?": Probing on Chinese Grammatical Error Correction

Figure 3 for "Is Whole Word Masking Always Better for Chinese BERT?": Probing on Chinese Grammatical Error Correction

Figure 4 for "Is Whole Word Masking Always Better for Chinese BERT?": Probing on Chinese Grammatical Error Correction

Abstract:Whole word masking (WWM), which masks all subwords corresponding to a word at once, makes a better English BERT model. For the Chinese language, however, there is no subword because each token is an atomic character. The meaning of a word in Chinese is different in that a word is a compositional unit consisting of multiple characters. Such difference motivates us to investigate whether WWM leads to better context understanding ability for Chinese BERT. To achieve this, we introduce two probing tasks related to grammatical error correction and ask pretrained models to revise or insert tokens in a masked language modeling manner. We construct a dataset including labels for 19,075 tokens in 10,448 sentences. We train three Chinese BERT models with standard character-level masking (CLM), WWM, and a combination of CLM and WWM, respectively. Our major findings are as follows: First, when one character needs to be inserted or replaced, the model trained with CLM performs the best. Second, when more than one character needs to be handled, WWM is the key to better performance. Finally, when being fine-tuned on sentence-level downstream tasks, models trained with different masking strategies perform comparably.

* Short paper in Findings of ACL 2022

Via

Access Paper or Ask Questions

Exploring and Adapting Chinese GPT to Pinyin Input Method

Mar 02, 2022

Minghuan Tan, Yong Dai, Duyu Tang, Zhangyin Feng, Guoping Huang, Jing Jiang, Jiwei Li, Shuming Shi

Figure 1 for Exploring and Adapting Chinese GPT to Pinyin Input Method

Figure 2 for Exploring and Adapting Chinese GPT to Pinyin Input Method

Figure 3 for Exploring and Adapting Chinese GPT to Pinyin Input Method

Figure 4 for Exploring and Adapting Chinese GPT to Pinyin Input Method

Abstract:While GPT has become the de-facto method for text generation tasks, its application to pinyin input method remains unexplored. In this work, we make the first exploration to leverage Chinese GPT for pinyin input method. We find that a frozen GPT achieves state-of-the-art performance on perfect pinyin. However, the performance drops dramatically when the input includes abbreviated pinyin. A reason is that an abbreviated pinyin can be mapped to many perfect pinyin, which links to even larger number of Chinese characters. We mitigate this issue with two strategies, including enriching the context with pinyin and optimizing the training process to help distinguish homophones. To further facilitate the evaluation of pinyin input method, we create a dataset consisting of 270K instances from 15 domains. Results show that our approach improves performance on abbreviated pinyin across all domains. Model analysis demonstrates that both strategies contribute to the performance boost.

* To appear in ACL 2022

Via

Access Paper or Ask Questions

Pretraining without Wordpieces: Learning Over a Vocabulary of Millions of Words

Feb 24, 2022

Zhangyin Feng, Duyu Tang, Cong Zhou, Junwei Liao, Shuangzhi Wu, Xiaocheng Feng, Bing Qin, Yunbo Cao, Shuming Shi

Figure 1 for Pretraining without Wordpieces: Learning Over a Vocabulary of Millions of Words

Figure 2 for Pretraining without Wordpieces: Learning Over a Vocabulary of Millions of Words

Figure 3 for Pretraining without Wordpieces: Learning Over a Vocabulary of Millions of Words

Figure 4 for Pretraining without Wordpieces: Learning Over a Vocabulary of Millions of Words

Abstract:The standard BERT adopts subword-based tokenization, which may break a word into two or more wordpieces (e.g., converting "lossless" to "loss" and "less"). This will bring inconvenience in following situations: (1) what is the best way to obtain the contextual vector of a word that is divided into multiple wordpieces? (2) how to predict a word via cloze test without knowing the number of wordpieces in advance? In this work, we explore the possibility of developing BERT-style pretrained model over a vocabulary of words instead of wordpieces. We call such word-level BERT model as WordBERT. We train models with different vocabulary sizes, initialization configurations and languages. Results show that, compared to standard wordpiece-based BERT, WordBERT makes significant improvements on cloze test and machine reading comprehension. On many other natural language understanding tasks, including POS tagging, chunking and NER, WordBERT consistently performs better than BERT. Model analysis indicates that the major advantage of WordBERT over BERT lies in the understanding for low-frequency words and rare words. Furthermore, since the pipeline is language-independent, we train WordBERT for Chinese language and obtain significant gains on five natural language understanding datasets. Lastly, the analyse on inference speed illustrates WordBERT has comparable time cost to BERT in natural language understanding tasks.

Via

Access Paper or Ask Questions

CoSQA: 20,000+ Web Queries for Code Search and Question Answering

May 27, 2021

Junjie Huang, Duyu Tang, Linjun Shou, Ming Gong, Ke Xu, Daxin Jiang, Ming Zhou, Nan Duan

Figure 1 for CoSQA: 20,000+ Web Queries for Code Search and Question Answering

Figure 2 for CoSQA: 20,000+ Web Queries for Code Search and Question Answering

Figure 3 for CoSQA: 20,000+ Web Queries for Code Search and Question Answering

Figure 4 for CoSQA: 20,000+ Web Queries for Code Search and Question Answering

Abstract:Finding codes given natural language query isb eneficial to the productivity of software developers. Future progress towards better semantic matching between query and code requires richer supervised training resources. To remedy this, we introduce the CoSQA dataset.It includes 20,604 labels for pairs of natural language queries and codes, each annotated by at least 3 human annotators. We further introduce a contrastive learning method dubbed CoCLR to enhance query-code matching, which works as a data augmenter to bring more artificially generated training instances. We show that evaluated on CodeXGLUE with the same CodeBERT model, training on CoSQA improves the accuracy of code question answering by 5.1%, and incorporating CoCLR brings a further improvement of 10.5%.

* ACL 2021 main conference. The CoSQA data and leaderboard are available at https://github.com/microsoft/CodeXGLUE/tree/main/Text-Code/NL-code-search-WebQuery. The code is available at https://github.com/Jun-jie-Huang/CoCLR

Via

Access Paper or Ask Questions