Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Shafiq Joty

Reliability Testing for Natural Language Processing Systems

May 06, 2021
Samson Tan, Shafiq Joty, Kathy Baxter, Araz Taeihagh, Gregory A. Bennett, Min-Yen Kan

Figure 1 for Reliability Testing for Natural Language Processing Systems

Figure 2 for Reliability Testing for Natural Language Processing Systems

Questions of fairness, robustness, and transparency are paramount to address before deploying NLP systems. Central to these concerns is the question of reliability: Can NLP systems reliably treat different demographics fairly and function correctly in diverse and noisy environments? To address this, we argue for the need for reliability testing and contextualize it among existing work on improving accountability. We show how adversarial attacks can be reframed for this goal, via a framework for developing reliability tests. We argue that reliability testing -- with an emphasis on interdisciplinary collaboration -- will enable rigorous and targeted testing, and aid in the enactment and enforcement of industry standards.

* Accepted to ACL-IJCNLP 2021 (main conference). Final camera-ready version to follow shortly

Via

Access Paper or Ask Questions

Addressing the Vulnerability of NMT in Input Perturbations

Apr 20, 2021
Weiwen Xu, Ai Ti Aw, Yang Ding, Kui Wu, Shafiq Joty

Figure 1 for Addressing the Vulnerability of NMT in Input Perturbations

Figure 2 for Addressing the Vulnerability of NMT in Input Perturbations

Figure 3 for Addressing the Vulnerability of NMT in Input Perturbations

Figure 4 for Addressing the Vulnerability of NMT in Input Perturbations

Neural Machine Translation (NMT) has achieved significant breakthrough in performance but is known to suffer vulnerability to input perturbations. As real input noise is difficult to predict during training, robustness is a big issue for system deployment. In this paper, we improve the robustness of NMT models by reducing the effect of noisy words through a Context-Enhanced Reconstruction (CER) approach. CER trains the model to resist noise in two steps: (1) perturbation step that breaks the naturalness of input sequence with made-up words; (2) reconstruction step that defends the noise propagation by generating better and more robust contextual representation. Experimental results on Chinese-English (ZH-EN) and French-English (FR-EN) translation tasks demonstrate robustness improvement on both news and social media text. Further fine-tuning experiments on social media text show our approach can converge at a higher position and provide a better adaptation.

* Accepted by NAACL 2021 Industry Track

Via

Access Paper or Ask Questions

Code-Mixing on Sesame Street: Dawn of the Adversarial Polyglots

Mar 17, 2021
Samson Tan, Shafiq Joty

Figure 1 for Code-Mixing on Sesame Street: Dawn of the Adversarial Polyglots

Figure 2 for Code-Mixing on Sesame Street: Dawn of the Adversarial Polyglots

Figure 3 for Code-Mixing on Sesame Street: Dawn of the Adversarial Polyglots

Figure 4 for Code-Mixing on Sesame Street: Dawn of the Adversarial Polyglots

Multilingual models have demonstrated impressive cross-lingual transfer performance. However, test sets like XNLI are monolingual at the example level. In multilingual communities, it is common for polyglots to code-mix when conversing with each other. Inspired by this phenomenon, we present two strong black-box adversarial attacks (one word-level, one phrase-level) for multilingual models that push their ability to handle code-mixed sentences to the limit. The former uses bilingual dictionaries to propose perturbations and translations of the clean example for sense disambiguation. The latter directly aligns the clean example with its translations before extracting phrases as perturbations. Our phrase-level attack has a success rate of 89.75% against XLM-R-large, bringing its average accuracy of 79.85 down to 8.18 on XNLI. Finally, we propose an efficient adversarial training scheme that trains in the same number of steps as the original model and show that it improves model accuracy.

* To be presented at NAACL-HLT 2021. Final version to follow

Via

Access Paper or Ask Questions

Towards Multi-Sense Cross-Lingual Alignment of Contextual Embeddings

Mar 11, 2021
Linlin Liu, Thien Hai Nguyen, Shafiq Joty, Lidong Bing, Luo Si

Figure 1 for Towards Multi-Sense Cross-Lingual Alignment of Contextual Embeddings

Figure 2 for Towards Multi-Sense Cross-Lingual Alignment of Contextual Embeddings

Figure 3 for Towards Multi-Sense Cross-Lingual Alignment of Contextual Embeddings

Figure 4 for Towards Multi-Sense Cross-Lingual Alignment of Contextual Embeddings

Cross-lingual word embeddings (CLWE) have been proven useful in many cross-lingual tasks. However, most existing approaches to learn CLWE including the ones with contextual embeddings are sense agnostic. In this work, we propose a novel framework to align contextual embeddings at the sense level by leveraging cross-lingual signal from bilingual dictionaries only. We operationalize our framework by first proposing a novel sense-aware cross entropy loss to model word senses explicitly. The monolingual ELMo and BERT models pretrained with our sense-aware cross entropy loss demonstrate significant performance improvement for word sense disambiguation tasks. We then propose a sense alignment objective on top of the sense-aware cross entropy loss for cross-lingual model pretraining, and pretrain cross-lingual models for several language pairs (English to German/Spanish/Japanese/Chinese). Compared with the best baseline results, our cross-lingual models achieve 0.52%, 2.09% and 1.29% average performance improvements on zero-shot cross-lingual NER, sentiment classification and XNLI tasks, respectively.

Via

Access Paper or Ask Questions

Weakly Supervised Neuro-Symbolic Module Networks for Numerical Reasoning

Jan 28, 2021
Amrita Saha, Shafiq Joty, Steven C. H. Hoi

Figure 1 for Weakly Supervised Neuro-Symbolic Module Networks for Numerical Reasoning

Figure 2 for Weakly Supervised Neuro-Symbolic Module Networks for Numerical Reasoning

Figure 3 for Weakly Supervised Neuro-Symbolic Module Networks for Numerical Reasoning

Figure 4 for Weakly Supervised Neuro-Symbolic Module Networks for Numerical Reasoning

Neural Module Networks (NMNs) have been quite successful in incorporating explicit reasoning as learnable modules in various question answering tasks, including the most generic form of numerical reasoning over text in Machine Reading Comprehension (MRC). However, to achieve this, contemporary NMNs need strong supervision in executing the query as a specialized program over reasoning modules and fail to generalize to more open-ended settings without such supervision. Hence we propose Weakly-Supervised Neuro-Symbolic Module Network (WNSMN) trained with answers as the sole supervision for numerical reasoning based MRC. It learns to execute a noisy heuristic program obtained from the dependency parsing of the query, as discrete actions over both neural and symbolic reasoning modules and trains it end-to-end in a reinforcement learning framework with discrete reward from answer matching. On the numerical-answer subset of DROP, WNSMN out-performs NMN by 32% and the reasoning-free language model GenBERT by 8% in exact match accuracy when trained under comparable weak supervised settings. This showcases the effectiveness and generalizability of modular networks that can handle explicit discrete reasoning over noisy programs in an end-to-end manner.

Via

Access Paper or Ask Questions

DAGA: Data Augmentation with a Generation Approach for Low-resource Tagging Tasks

Nov 03, 2020
Bosheng Ding, Linlin Liu, Lidong Bing, Canasai Kruengkrai, Thien Hai Nguyen, Shafiq Joty, Luo Si, Chunyan Miao

Figure 1 for DAGA: Data Augmentation with a Generation Approach for Low-resource Tagging Tasks

Figure 2 for DAGA: Data Augmentation with a Generation Approach for Low-resource Tagging Tasks

Figure 3 for DAGA: Data Augmentation with a Generation Approach for Low-resource Tagging Tasks

Figure 4 for DAGA: Data Augmentation with a Generation Approach for Low-resource Tagging Tasks

Data augmentation techniques have been widely used to improve machine learning performance as they enhance the generalization capability of models. In this work, to generate high quality synthetic data for low-resource tagging tasks, we propose a novel augmentation method with language models trained on the linearized labeled sentences. Our method is applicable to both supervised and semi-supervised settings. For the supervised settings, we conduct extensive experiments on named entity recognition (NER), part of speech (POS) tagging and end-to-end target based sentiment analysis (E2E-TBSA) tasks. For the semi-supervised settings, we evaluate our method on the NER task under the conditions of given unlabeled data only and unlabeled data plus a knowledge base. The results show that our method can consistently outperform the baselines, particularly when the given gold training data are less.

* Accepted by EMNLP 2020

Via

Access Paper or Ask Questions

Improving Zero and Few-Shot Abstractive Summarization with Intermediate Fine-tuning and Data Augmentation

Oct 24, 2020
Alexander R. Fabbri, Simeng Han, Haoyuan Li, Haoran Li, Marjan Ghazvininejad, Shafiq Joty, Dragomir Radev, Yashar Mehdad

Figure 1 for Improving Zero and Few-Shot Abstractive Summarization with Intermediate Fine-tuning and Data Augmentation

Figure 2 for Improving Zero and Few-Shot Abstractive Summarization with Intermediate Fine-tuning and Data Augmentation

Figure 3 for Improving Zero and Few-Shot Abstractive Summarization with Intermediate Fine-tuning and Data Augmentation

Figure 4 for Improving Zero and Few-Shot Abstractive Summarization with Intermediate Fine-tuning and Data Augmentation

Models pretrained with self-supervised objectives on large text corpora achieve state-of-the-art performance on text summarization tasks. However, these models are typically fine-tuned on hundreds of thousands of data points, an infeasible requirement when applying summarization to new, niche domains. In this work, we introduce a general method, called WikiTransfer, for fine-tuning pretrained models for summarization in an unsupervised, dataset-specific manner which makes use of characteristics of the target dataset such as the length and abstractiveness of the desired summaries. We achieve state-of-the-art, zero-shot abstractive summarization performance on the CNN-DailyMail dataset and demonstrate the effectiveness of our approach on three additional, diverse datasets. The models fine-tuned in this unsupervised manner are more robust to noisy data and also achieve better few-shot performance using 10 and 100 training examples. We perform ablation studies on the effect of the components of our unsupervised fine-tuning data and analyze the performance of these models in few-shot scenarios along with data augmentation techniques using both automatic and human evaluation.

Via

Access Paper or Ask Questions

Online Conversation Disentanglement with Pointer Networks

Oct 21, 2020
Tao Yu, Shafiq Joty

Figure 1 for Online Conversation Disentanglement with Pointer Networks

Figure 2 for Online Conversation Disentanglement with Pointer Networks

Figure 3 for Online Conversation Disentanglement with Pointer Networks

Figure 4 for Online Conversation Disentanglement with Pointer Networks

Huge amounts of textual conversations occur online every day, where multiple conversations take place concurrently. Interleaved conversations lead to difficulties in not only following the ongoing discussions but also extracting relevant information from simultaneous messages. Conversation disentanglement aims to separate intermingled messages into detached conversations. However, existing disentanglement methods rely mostly on handcrafted features that are dataset specific, which hinders generalization and adaptability. In this work, we propose an end-to-end online framework for conversation disentanglement that avoids time-consuming domain-specific feature engineering. We design a novel way to embed the whole utterance that comprises timestamp, speaker, and message text, and proposes a custom attention mechanism that models disentanglement as a pointing problem while effectively capturing inter-utterance interactions in an end-to-end fashion. We also introduce a joint-learning objective to better capture contextual information. Our experiments on the Ubuntu IRC dataset show that our method achieves state-of-the-art performance in both link and conversation prediction tasks.

Via

Access Paper or Ask Questions

Discern: Discourse-Aware Entailment Reasoning Network for Conversational Machine Reading

Oct 16, 2020
Yifan Gao, Chien-Sheng Wu, Jingjing Li, Shafiq Joty, Steven C. H. Hoi, Caiming Xiong, Irwin King, Michael R. Lyu

Figure 1 for Discern: Discourse-Aware Entailment Reasoning Network for Conversational Machine Reading

Figure 2 for Discern: Discourse-Aware Entailment Reasoning Network for Conversational Machine Reading

Figure 3 for Discern: Discourse-Aware Entailment Reasoning Network for Conversational Machine Reading

Figure 4 for Discern: Discourse-Aware Entailment Reasoning Network for Conversational Machine Reading

Document interpretation and dialog understanding are the two major challenges for conversational machine reading. In this work, we propose Discern, a discourse-aware entailment reasoning network to strengthen the connection and enhance the understanding for both document and dialog. Specifically, we split the document into clause-like elementary discourse units (EDU) using a pre-trained discourse segmentation model, and we train our model in a weakly-supervised manner to predict whether each EDU is entailed by the user feedback in a conversation. Based on the learned EDU and entailment representations, we either reply to the user our final decision "yes/no/irrelevant" of the initial question, or generate a follow-up question to inquiry more information. Our experiments on the ShARC benchmark (blind, held-out test set) show that Discern achieves state-of-the-art results of 78.3% macro-averaged accuracy on decision making and 64.0 BLEU1 on follow-up question generation. Code and models are released at https://github.com/Yifan-Gao/Discern.

* EMNLP 2020 main conference, 11 pages, 3 Figures

Via

Access Paper or Ask Questions

Response Selection for Multi-Party Conversations with Dynamic Topic Tracking

Oct 15, 2020
Weishi Wang, Shafiq Joty, Steven C. H. Hoi

Figure 1 for Response Selection for Multi-Party Conversations with Dynamic Topic Tracking

Figure 2 for Response Selection for Multi-Party Conversations with Dynamic Topic Tracking

Figure 3 for Response Selection for Multi-Party Conversations with Dynamic Topic Tracking

Figure 4 for Response Selection for Multi-Party Conversations with Dynamic Topic Tracking

While participants in a multi-party multi-turn conversation simultaneously engage in multiple conversation topics, existing response selection methods are developed mainly focusing on a two-party single-conversation scenario. Hence, the prolongation and transition of conversation topics are ignored by current methods. In this work, we frame response selection as a dynamic topic tracking task to match the topic between the response and relevant conversation context. With this new formulation, we propose a novel multi-task learning framework that supports efficient encoding through large pretrained models with only two utterances at once to perform dynamic topic disentanglement and response selection. We also propose Topic-BERT an essential pretraining step to embed topic information into BERT with self-supervised learning. Experimental results on the DSTC-8 Ubuntu IRC dataset show state-of-the-art results in response selection and topic disentanglement tasks outperforming existing methods by a good margin.

* 9 pages, EMNLP2020

Via

Access Paper or Ask Questions