Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Linjun Shou

From Good to Best: Two-Stage Training for Cross-lingual Machine Reading Comprehension

Dec 09, 2021

Nuo Chen, Linjun Shou, Min Gong, Jian Pei, Daxin Jiang

Figure 1 for From Good to Best: Two-Stage Training for Cross-lingual Machine Reading Comprehension

Figure 2 for From Good to Best: Two-Stage Training for Cross-lingual Machine Reading Comprehension

Figure 3 for From Good to Best: Two-Stage Training for Cross-lingual Machine Reading Comprehension

Figure 4 for From Good to Best: Two-Stage Training for Cross-lingual Machine Reading Comprehension

Abstract:Cross-lingual Machine Reading Comprehension (xMRC) is challenging due to the lack of training data in low-resource languages. The recent approaches use training data only in a resource-rich language like English to fine-tune large-scale cross-lingual pre-trained language models. Due to the big difference between languages, a model fine-tuned only by a source language may not perform well for target languages. Interestingly, we observe that while the top-1 results predicted by the previous approaches may often fail to hit the ground-truth answers, the correct answers are often contained in the top-k predicted results. Based on this observation, we develop a two-stage approach to enhance the model performance. The first stage targets at recall: we design a hard-learning (HL) algorithm to maximize the likelihood that the top-k predictions contain the accurate answer. The second stage focuses on precision: an answer-aware contrastive learning (AA-CL) mechanism is developed to learn the fine difference between the accurate answer and other candidates. Our extensive experiments show that our model significantly outperforms a series of strong baselines on two cross-lingual MRC benchmark datasets.

Via

Access Paper or Ask Questions

Learning from Multiple Noisy Augmented Data Sets for Better Cross-Lingual Spoken Language Understanding

Sep 03, 2021

Yingmei Guo, Linjun Shou, Jian Pei, Ming Gong, Mingxing Xu, Zhiyong Wu, Daxin Jiang

Figure 1 for Learning from Multiple Noisy Augmented Data Sets for Better Cross-Lingual Spoken Language Understanding

Figure 2 for Learning from Multiple Noisy Augmented Data Sets for Better Cross-Lingual Spoken Language Understanding

Figure 3 for Learning from Multiple Noisy Augmented Data Sets for Better Cross-Lingual Spoken Language Understanding

Figure 4 for Learning from Multiple Noisy Augmented Data Sets for Better Cross-Lingual Spoken Language Understanding

Abstract:Lack of training data presents a grand challenge to scaling out spoken language understanding (SLU) to low-resource languages. Although various data augmentation approaches have been proposed to synthesize training data in low-resource target languages, the augmented data sets are often noisy, and thus impede the performance of SLU models. In this paper we focus on mitigating noise in augmented data. We develop a denoising training approach. Multiple models are trained with data produced by various augmented methods. Those models provide supervision signals to each other. The experimental results show that our method outperforms the existing state of the art by 3.05 and 4.24 percentage points on two benchmark datasets, respectively. The code will be made open sourced on github.

* Long paper at EMNLP 2021

Via

Access Paper or Ask Questions

A Joint and Domain-Adaptive Approach to Spoken Language Understanding

Jul 25, 2021

Linhao Zhang, Yu Shi, Linjun Shou, Ming Gong, Houfeng Wang, Michael Zeng

Figure 1 for A Joint and Domain-Adaptive Approach to Spoken Language Understanding

Figure 2 for A Joint and Domain-Adaptive Approach to Spoken Language Understanding

Figure 3 for A Joint and Domain-Adaptive Approach to Spoken Language Understanding

Figure 4 for A Joint and Domain-Adaptive Approach to Spoken Language Understanding

Abstract:Spoken Language Understanding (SLU) is composed of two subtasks: intent detection (ID) and slot filling (SF). There are two lines of research on SLU. One jointly tackles these two subtasks to improve their prediction accuracy, and the other focuses on the domain-adaptation ability of one of the subtasks. In this paper, we attempt to bridge these two lines of research and propose a joint and domain adaptive approach to SLU. We formulate SLU as a constrained generation task and utilize a dynamic vocabulary based on domain-specific ontology. We conduct experiments on the ASMixed and MTOD datasets and achieve competitive performance with previous state-of-the-art joint models. Besides, results show that our joint model can be effectively adapted to a new domain.

Via

Access Paper or Ask Questions

Reinforced Iterative Knowledge Distillation for Cross-Lingual Named Entity Recognition

Jun 01, 2021

Shining Liang, Ming Gong, Jian Pei, Linjun Shou, Wanli Zuo, Xianglin Zuo, Daxin Jiang

Figure 1 for Reinforced Iterative Knowledge Distillation for Cross-Lingual Named Entity Recognition

Figure 2 for Reinforced Iterative Knowledge Distillation for Cross-Lingual Named Entity Recognition

Figure 3 for Reinforced Iterative Knowledge Distillation for Cross-Lingual Named Entity Recognition

Figure 4 for Reinforced Iterative Knowledge Distillation for Cross-Lingual Named Entity Recognition

Abstract:Named entity recognition (NER) is a fundamental component in many applications, such as Web Search and Voice Assistants. Although deep neural networks greatly improve the performance of NER, due to the requirement of large amounts of training data, deep neural networks can hardly scale out to many languages in an industry setting. To tackle this challenge, cross-lingual NER transfers knowledge from a rich-resource language to languages with low resources through pre-trained multilingual language models. Instead of using training data in target languages, cross-lingual NER has to rely on only training data in source languages, and optionally adds the translated training data derived from source languages. However, the existing cross-lingual NER methods do not make good use of rich unlabeled data in target languages, which is relatively easy to collect in industry applications. To address the opportunities and challenges, in this paper we describe our novel practice in Microsoft to leverage such large amounts of unlabeled data in target languages in real production settings. To effectively extract weak supervision signals from the unlabeled data, we develop a novel approach based on the ideas of semi-supervised learning and reinforcement learning. The empirical study on three benchmark data sets verifies that our approach establishes the new state-of-the-art performance with clear edges. Now, the NER techniques reported in this paper are on their way to become a fundamental component for Web ranking, Entity Pane, Answers Triggering, and Question Answering in the Microsoft Bing search engine. Moreover, our techniques will also serve as part of the Spoken Language Understanding module for a commercial voice assistant. We plan to open source the code of the prototype framework after deployment.

* KDD 2021

Via

Access Paper or Ask Questions

CoSQA: 20,000+ Web Queries for Code Search and Question Answering

May 27, 2021

Junjie Huang, Duyu Tang, Linjun Shou, Ming Gong, Ke Xu, Daxin Jiang, Ming Zhou, Nan Duan

Figure 1 for CoSQA: 20,000+ Web Queries for Code Search and Question Answering

Figure 2 for CoSQA: 20,000+ Web Queries for Code Search and Question Answering

Figure 3 for CoSQA: 20,000+ Web Queries for Code Search and Question Answering

Figure 4 for CoSQA: 20,000+ Web Queries for Code Search and Question Answering

Abstract:Finding codes given natural language query isb eneficial to the productivity of software developers. Future progress towards better semantic matching between query and code requires richer supervised training resources. To remedy this, we introduce the CoSQA dataset.It includes 20,604 labels for pairs of natural language queries and codes, each annotated by at least 3 human annotators. We further introduce a contrastive learning method dubbed CoCLR to enhance query-code matching, which works as a data augmenter to bring more artificially generated training instances. We show that evaluated on CodeXGLUE with the same CodeBERT model, training on CoSQA improves the accuracy of code question answering by 5.1%, and incorporating CoCLR brings a further improvement of 10.5%.

* ACL 2021 main conference. The CoSQA data and leaderboard are available at https://github.com/microsoft/CodeXGLUE/tree/main/Text-Code/NL-code-search-WebQuery. The code is available at https://github.com/Jun-jie-Huang/CoCLR

Via

Access Paper or Ask Questions

Retrieval Enhanced Model for Commonsense Generation

May 24, 2021

Han Wang, Yang Liu, Chenguang Zhu, Linjun Shou, Ming Gong, Yichong Xu, Michael Zeng

Figure 1 for Retrieval Enhanced Model for Commonsense Generation

Figure 2 for Retrieval Enhanced Model for Commonsense Generation

Figure 3 for Retrieval Enhanced Model for Commonsense Generation

Figure 4 for Retrieval Enhanced Model for Commonsense Generation

Abstract:Commonsense generation is a challenging task of generating a plausible sentence describing an everyday scenario using provided concepts. Its requirement of reasoning over commonsense knowledge and compositional generalization ability even puzzles strong pre-trained language generation models. We propose a novel framework using retrieval methods to enhance both the pre-training and fine-tuning for commonsense generation. We retrieve prototype sentence candidates by concept matching and use them as auxiliary input. For fine-tuning, we further boost its performance with a trainable sentence retriever. We demonstrate experimentally on the large-scale CommonGen benchmark that our approach achieves new state-of-the-art results.

* Findings of ACL-IJCNLP 2021

Via

Access Paper or Ask Questions

WhiteningBERT: An Easy Unsupervised Sentence Embedding Approach

Apr 09, 2021

Junjie Huang, Duyu Tang, Wanjun Zhong, Shuai Lu, Linjun Shou, Ming Gong, Daxin Jiang, Nan Duan

Figure 1 for WhiteningBERT: An Easy Unsupervised Sentence Embedding Approach

Figure 2 for WhiteningBERT: An Easy Unsupervised Sentence Embedding Approach

Figure 3 for WhiteningBERT: An Easy Unsupervised Sentence Embedding Approach

Figure 4 for WhiteningBERT: An Easy Unsupervised Sentence Embedding Approach

Abstract:Producing the embedding of a sentence in an unsupervised way is valuable to natural language matching and retrieval problems in practice. In this work, we conduct a thorough examination of pretrained model based unsupervised sentence embeddings. We study on four pretrained models and conduct massive experiments on seven datasets regarding sentence semantics. We have there main findings. First, averaging all tokens is better than only using [CLS] vector. Second, combining both top andbottom layers is better than only using top layers. Lastly, an easy whitening-based vector normalization strategy with less than 10 lines of code consistently boosts the performance.

Via

Access Paper or Ask Questions

Generating Human Readable Transcript for Automatic Speech Recognition with Pre-trained Language Model

Feb 22, 2021

Junwei Liao, Yu Shi, Ming Gong, Linjun Shou, Sefik Eskimez, Liyang Lu, Hong Qu, Michael Zeng

Figure 1 for Generating Human Readable Transcript for Automatic Speech Recognition with Pre-trained Language Model

Figure 2 for Generating Human Readable Transcript for Automatic Speech Recognition with Pre-trained Language Model

Figure 3 for Generating Human Readable Transcript for Automatic Speech Recognition with Pre-trained Language Model

Abstract:Modern Automatic Speech Recognition (ASR) systems can achieve high performance in terms of recognition accuracy. However, a perfectly accurate transcript still can be challenging to read due to disfluency, filter words, and other errata common in spoken communication. Many downstream tasks and human readers rely on the output of the ASR system; therefore, errors introduced by the speaker and ASR system alike will be propagated to the next task in the pipeline. In this work, we propose an ASR post-processing model that aims to transform the incorrect and noisy ASR output into a readable text for humans and downstream tasks. We leverage the Metadata Extraction (MDE) corpus to construct a task-specific dataset for our study. Since the dataset is small, we propose a novel data augmentation method and use a two-stage training strategy to fine-tune the RoBERTa pre-trained model. On the constructed test set, our model outperforms a production two-step pipeline-based post-processing method by a large margin of 13.26 on readability-aware WER (RA-WER) and 17.53 on BLEU metrics. Human evaluation also demonstrates that our method can generate more human-readable transcripts than the baseline method.

* Accepted in 2021 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2021)

Via

Access Paper or Ask Questions

Improving Zero-shot Neural Machine Translation on Language-specific Encoders-Decoders

Feb 12, 2021

Junwei Liao, Yu Shi, Ming Gong, Linjun Shou, Hong Qu, Michael Zeng

Figure 1 for Improving Zero-shot Neural Machine Translation on Language-specific Encoders-Decoders

Figure 2 for Improving Zero-shot Neural Machine Translation on Language-specific Encoders-Decoders

Figure 3 for Improving Zero-shot Neural Machine Translation on Language-specific Encoders-Decoders

Figure 4 for Improving Zero-shot Neural Machine Translation on Language-specific Encoders-Decoders

Abstract:Recently, universal neural machine translation (NMT) with shared encoder-decoder gained good performance on zero-shot translation. Unlike universal NMT, jointly trained language-specific encoders-decoders aim to achieve universal representation across non-shared modules, each of which is for a language or language family. The non-shared architecture has the advantage of mitigating internal language competition, especially when the shared vocabulary and model parameters are restricted in their size. However, the performance of using multiple encoders and decoders on zero-shot translation still lags behind universal NMT. In this work, we study zero-shot translation using language-specific encoders-decoders. We propose to generalize the non-shared architecture and universal NMT by differentiating the Transformer layers between language-specific and interlingua. By selectively sharing parameters and applying cross-attentions, we explore maximizing the representation universality and realizing the best alignment of language-agnostic information. We also introduce a denoising auto-encoding (DAE) objective to jointly train the model with the translation task in a multi-task manner. Experiments on two public multilingual parallel datasets show that our proposed model achieves a competitive or better results than universal NMT and strong pivot baseline. Moreover, we experiment incrementally adding new language to the trained model by only updating the new model parameters. With this little effort, the zero-shot translation between this newly added language and existing languages achieves a comparable result with the model trained jointly from scratch on all languages.

Via

Access Paper or Ask Questions

CodeXGLUE: A Machine Learning Benchmark Dataset for Code Understanding and Generation

Feb 09, 2021

Shuai Lu, Daya Guo, Shuo Ren, Junjie Huang, Alexey Svyatkovskiy, Ambrosio Blanco, Colin Clement, Dawn Drain, Daxin Jiang, Duyu Tang(+12 more)

Figure 1 for CodeXGLUE: A Machine Learning Benchmark Dataset for Code Understanding and Generation

Figure 2 for CodeXGLUE: A Machine Learning Benchmark Dataset for Code Understanding and Generation

Figure 3 for CodeXGLUE: A Machine Learning Benchmark Dataset for Code Understanding and Generation

Figure 4 for CodeXGLUE: A Machine Learning Benchmark Dataset for Code Understanding and Generation

Abstract:Benchmark datasets have a significant impact on accelerating research in programming language tasks. In this paper, we introduce CodeXGLUE, a benchmark dataset to foster machine learning research for program understanding and generation. CodeXGLUE includes a collection of 10 tasks across 14 datasets and a platform for model evaluation and comparison. CodeXGLUE also features three baseline systems, including the BERT-style, GPT-style, and Encoder-Decoder models, to make it easy for researchers to use the platform. The availability of such data and baselines can help the development and validation of new methods that can be applied to various program understanding and generation problems.

Via

Access Paper or Ask Questions