Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Hai Zhao

Department of Computer Science and Engineering, Shanghai Jiao Tong University, Key Laboratory of Shanghai Education Commission for Intelligent Interaction and Cognitive Engineering, Shanghai Jiao Tong University, MoE Key Lab of Artificial Intelligence, AI Institute, Shanghai Jiao Tong University

Nested Named Entity Recognition as Holistic Structure Parsing

Apr 17, 2022

Yifei Yang, Zuchao Li, Hai Zhao

Figure 1 for Nested Named Entity Recognition as Holistic Structure Parsing

Figure 2 for Nested Named Entity Recognition as Holistic Structure Parsing

Figure 3 for Nested Named Entity Recognition as Holistic Structure Parsing

Figure 4 for Nested Named Entity Recognition as Holistic Structure Parsing

Abstract:As a fundamental natural language processing task and one of core knowledge extraction techniques, named entity recognition (NER) is widely used to extract information from texts for downstream tasks. Nested NER is a branch of NER in which the named entities (NEs) are nested with each other. However, most of the previous studies on nested NER usually apply linear structure to model the nested NEs which are actually accommodated in a hierarchical structure. Thus in order to address this mismatch, this work models the full nested NEs in a sentence as a holistic structure, then we propose a holistic structure parsing algorithm to disclose the entire NEs once for all. Besides, there is no research on applying corpus-level information to NER currently. To make up for the loss of this information, we introduce Point-wise Mutual Information (PMI) and other frequency features from corpus-aware statistics for even better performance by holistic modeling from sentence-level to corpus-level. Experiments show that our model yields promising results on widely-used benchmarks which approach or even achieve state-of-the-art. Further empirical studies show that our proposed corpus-aware features can substantially improve NER domain adaptation, which demonstrates the surprising advantage of our proposed corpus-level holistic structure modeling.

Via

Access Paper or Ask Questions

Lite Unified Modeling for Discriminative Reading Comprehension

Mar 26, 2022

Yilin Zhao, Hai Zhao, Libin Shen, Yinggong Zhao

Figure 1 for Lite Unified Modeling for Discriminative Reading Comprehension

Figure 2 for Lite Unified Modeling for Discriminative Reading Comprehension

Figure 3 for Lite Unified Modeling for Discriminative Reading Comprehension

Figure 4 for Lite Unified Modeling for Discriminative Reading Comprehension

Abstract:As a broad and major category in machine reading comprehension (MRC), the generalized goal of discriminative MRC is answer prediction from the given materials. However, the focuses of various discriminative MRC tasks may be diverse enough: multi-choice MRC requires model to highlight and integrate all potential critical evidence globally; while extractive MRC focuses on higher local boundary preciseness for answer extraction. Among previous works, there lacks a unified design with pertinence for the overall discriminative MRC tasks. To fill in above gap, we propose a lightweight POS-Enhanced Iterative Co-Attention Network (POI-Net) as the first attempt of unified modeling with pertinence, to handle diverse discriminative MRC tasks synchronously. Nearly without introducing more parameters, our lite unified design brings model significant improvement with both encoder and decoder components. The evaluation results on four discriminative MRC benchmarks consistently indicate the general effectiveness and applicability of our model, and the code is available at https://github.com/Yilin1111/poi-net.

* Accepted by ACL 2022 Main Conference, Long Paper

Via

Access Paper or Ask Questions

Distinguishing Non-natural from Natural Adversarial Samples for More Robust Pre-trained Language Model

Mar 19, 2022

Jiayi Wang, Rongzhou Bao, Zhuosheng Zhang, Hai Zhao

Figure 1 for Distinguishing Non-natural from Natural Adversarial Samples for More Robust Pre-trained Language Model

Figure 2 for Distinguishing Non-natural from Natural Adversarial Samples for More Robust Pre-trained Language Model

Figure 3 for Distinguishing Non-natural from Natural Adversarial Samples for More Robust Pre-trained Language Model

Figure 4 for Distinguishing Non-natural from Natural Adversarial Samples for More Robust Pre-trained Language Model

Abstract:Recently, the problem of robustness of pre-trained language models (PrLMs) has received increasing research interest. Latest studies on adversarial attacks achieve high attack success rates against PrLMs, claiming that PrLMs are not robust. However, we find that the adversarial samples that PrLMs fail are mostly non-natural and do not appear in reality. We question the validity of current evaluation of robustness of PrLMs based on these non-natural adversarial samples and propose an anomaly detector to evaluate the robustness of PrLMs with more natural adversarial samples. We also investigate two applications of the anomaly detector: (1) In data augmentation, we employ the anomaly detector to force generating augmented data that are distinguished as non-natural, which brings larger gains to the accuracy of PrLMs. (2) We apply the anomaly detector to a defense framework to enhance the robustness of PrLMs. It can be used to defend all types of attacks and achieves higher accuracy on both adversarial samples and compliant samples than other defense frameworks.

* Accepted by findings of ACL 2022

Via

Access Paper or Ask Questions

Semantics-Preserved Distortion for Personal Privacy Protection

Jan 04, 2022

Letian Peng, Zuchao Li, Hai Zhao

Figure 1 for Semantics-Preserved Distortion for Personal Privacy Protection

Figure 2 for Semantics-Preserved Distortion for Personal Privacy Protection

Figure 3 for Semantics-Preserved Distortion for Personal Privacy Protection

Figure 4 for Semantics-Preserved Distortion for Personal Privacy Protection

Abstract:Privacy protection is an important and concerning topic in Federated Learning, especially for Natural Language Processing. In client devices, a large number of texts containing personal information are produced by users every day. As the direct application of information from users is likely to invade personal privacy, many methods have been proposed in Federated Learning to block the center model from the raw information in client devices. In this paper, we try to do this more linguistically via distorting the text while preserving the semantics. In practice, we leverage a recently proposed metric, Neighboring Distribution Divergence, to evaluate the semantic preservation during the distortion. Based on the metric, we propose two frameworks for semantics-preserved distortion, a generative one and a substitutive one. Due to the lack of privacy-related tasks in the current Natural Language Processing field, we conduct experiments on named entity recognition and constituency parsing. Results from our experiments show the plausibility and efficiency of our distortion as a method for personal privacy protection.

Via

Access Paper or Ask Questions

ArT: All-round Thinker for Unsupervised Commonsense Question-Answering

Dec 26, 2021

Jiawei Wang, Hai Zhao

Figure 1 for ArT: All-round Thinker for Unsupervised Commonsense Question-Answering

Figure 2 for ArT: All-round Thinker for Unsupervised Commonsense Question-Answering

Figure 3 for ArT: All-round Thinker for Unsupervised Commonsense Question-Answering

Figure 4 for ArT: All-round Thinker for Unsupervised Commonsense Question-Answering

Abstract:Without labeled question-answer pairs for necessary training, unsupervised commonsense question-answering (QA) appears to be extremely challenging due to its indispensable unique prerequisite on commonsense source like knowledge bases (KBs), which are usually highly resource consuming in construction. Recently pre-trained language models (PrLMs) show effectiveness as an alternative for commonsense clues when they play a role of knowledge generator. However, existing work simply generates hundreds of pseudo-answers, or roughly performs knowledge generation according to templates once for all, which may result in much noise and thus hinders the quality of generated knowledge. Motivated by human thinking experience, we propose an approach of All-round Thinker (ArT) by fully taking association during knowledge generating. In detail, our model first focuses on key parts in the given context, and then generates highly related knowledge on such a basis in an association way like human thinking. Besides, for casual reasoning, a reverse thinking mechanism is proposed to conduct bidirectional inferring between cause and effect. ArT is totally unsupervised and KBs-free. We evaluate it on three commonsense QA benchmarks: COPA, SocialIQA and SCT. On all scales of PrLM backbones, ArT shows its brilliant performance and outperforms previous advanced unsupervised models.

Via

Access Paper or Ask Questions

Seeking Common but Distinguishing Difference, A Joint Aspect-based Sentiment Analysis Model

Nov 18, 2021

Hongjiang Jing, Zuchao Li, Hai Zhao, Shu Jiang

Figure 1 for Seeking Common but Distinguishing Difference, A Joint Aspect-based Sentiment Analysis Model

Figure 2 for Seeking Common but Distinguishing Difference, A Joint Aspect-based Sentiment Analysis Model

Figure 3 for Seeking Common but Distinguishing Difference, A Joint Aspect-based Sentiment Analysis Model

Figure 4 for Seeking Common but Distinguishing Difference, A Joint Aspect-based Sentiment Analysis Model

Abstract:Aspect-based sentiment analysis (ABSA) task consists of three typical subtasks: aspect term extraction, opinion term extraction, and sentiment polarity classification. These three subtasks are usually performed jointly to save resources and reduce the error propagation in the pipeline. However, most of the existing joint models only focus on the benefits of encoder sharing between subtasks but ignore the difference. Therefore, we propose a joint ABSA model, which not only enjoys the benefits of encoder sharing but also focuses on the difference to improve the effectiveness of the model. In detail, we introduce a dual-encoder design, in which a pair encoder especially focuses on candidate aspect-opinion pair classification, and the original encoder keeps attention on sequence labeling. Empirical results show that our proposed model shows robustness and significantly outperforms the previous state-of-the-art on four benchmark datasets.

* EMNLP2021 camera-ready

Via

Access Paper or Ask Questions

Unsupervised Full Constituency Parsing with Neighboring Distribution Divergence

Oct 29, 2021

Letian Peng, Zuchao Li, Hai Zhao

Figure 1 for Unsupervised Full Constituency Parsing with Neighboring Distribution Divergence

Figure 2 for Unsupervised Full Constituency Parsing with Neighboring Distribution Divergence

Figure 3 for Unsupervised Full Constituency Parsing with Neighboring Distribution Divergence

Figure 4 for Unsupervised Full Constituency Parsing with Neighboring Distribution Divergence

Abstract:Unsupervised constituency parsing has been explored much but is still far from being solved. Conventional unsupervised constituency parser is only able to capture the unlabeled structure of sentences. Towards unsupervised full constituency parsing, we propose an unsupervised and training-free labeling procedure by exploiting the property of a recently introduced metric, Neighboring Distribution Divergence (NDD), which evaluates semantic similarity between sentences before and after editions. For implementation, we develop NDD into Dual POS-NDD (DP-NDD) and build "molds" to detect constituents and their labels in sentences. We show that DP-NDD not only labels constituents precisely but also inducts more accurate unlabeled constituency trees than all previous unsupervised methods with simpler rules. With two frameworks for labeled constituency trees inference, we set both the new state-of-the-art for unlabeled F1 and strong baselines for labeled F1. In contrast with the conventional predicting-and-evaluating scenario, our method acts as an plausible example to inversely apply evaluating metrics for prediction.

Via

Access Paper or Ask Questions

Structural Modeling for Dialogue Disentanglement

Oct 15, 2021

Xinbei Ma, Zhuosheng Zhang, Hai Zhao

Figure 1 for Structural Modeling for Dialogue Disentanglement

Figure 2 for Structural Modeling for Dialogue Disentanglement

Figure 3 for Structural Modeling for Dialogue Disentanglement

Figure 4 for Structural Modeling for Dialogue Disentanglement

Abstract:Tangled multi-party dialogue context leads to challenges for dialogue reading comprehension, where multiple dialogue threads flow simultaneously within the same dialogue history, thus increasing difficulties in understanding a dialogue history for both human and machine. Dialogue disentanglement aims to clarify conversation threads in a multi-party dialogue history, thus reducing the difficulty of comprehending the long disordered dialogue passage. Existing studies commonly focus on utterance encoding with carefully designed feature engineering-based methods but pay inadequate attention to dialogue structure. This work designs a novel model to disentangle multi-party history into threads, by taking dialogue structure features into account. Specifically, based on the fact that dialogues are constructed through successive participation of speakers and interactions between users of interest, we extract clues of speaker property and reference of users to model the structure of a long dialogue record. The novel method is evaluated on the Ubuntu IRC dataset and shows state-of-the-art experimental results in dialogue disentanglement.

Via

Access Paper or Ask Questions

Tracing Origins: Coref-aware Machine Reading Comprehension

Oct 15, 2021

Baorong Huang, Zhuosheng Zhang, Hai Zhao

Figure 1 for Tracing Origins: Coref-aware Machine Reading Comprehension

Figure 2 for Tracing Origins: Coref-aware Machine Reading Comprehension

Figure 3 for Tracing Origins: Coref-aware Machine Reading Comprehension

Figure 4 for Tracing Origins: Coref-aware Machine Reading Comprehension

Abstract:Machine reading comprehension is a heavily-studied research and test field for evaluating new pre-trained models and fine-tuning strategies, and recent studies have enriched the pre-trained models with syntactic, semantic and other linguistic information to improve the performance of the model. In this paper, we imitated the human's reading process in connecting the anaphoric expressions and explicitly leverage the coreference information to enhance the word embeddings from the pre-trained model, in order to highlight the coreference mentions that must be identified for coreference-intensive question answering in QUOREF, a relatively new dataset that is specifically designed to evaluate the coreference-related performance of a model. We used an additional BERT layer to focus on the coreference mentions, and a Relational Graph Convolutional Network to model the coreference relations. We demonstrated that the explicit incorporation of the coreference information in fine-tuning stage performed better than the incorporation of the coreference information in training a pre-trained language models.

Via

Access Paper or Ask Questions

Representation Decoupling for Open-Domain Passage Retrieval

Oct 14, 2021

Bohong Wu, Zhuosheng Zhang, Jinyuan Wang, Hai Zhao

Figure 1 for Representation Decoupling for Open-Domain Passage Retrieval

Figure 2 for Representation Decoupling for Open-Domain Passage Retrieval

Figure 3 for Representation Decoupling for Open-Domain Passage Retrieval

Figure 4 for Representation Decoupling for Open-Domain Passage Retrieval

Abstract:Training dense passage representations via contrastive learning (CL) has been shown effective for Open-Domain Passage Retrieval (ODPR). Recent studies mainly focus on optimizing this CL framework by improving the sampling strategy or extra pretraining. Different from previous studies, this work devotes itself to investigating the influence of conflicts in the widely used CL strategy in ODPR, motivated by our observation that a passage can be organized by multiple semantically different sentences, thus modeling such a passage as a unified dense vector is not optimal. We call such conflicts Contrastive Conflicts. In this work, we propose to solve it with a representation decoupling method, by decoupling the passage representations into contextual sentence-level ones, and design specific CL strategies to mediate these conflicts. Experiments on widely used datasets including Natural Questions, Trivia QA, and SQuAD verify the effectiveness of our method, especially on the dataset where the conflicting problem is severe. Our method also presents good transferability across the datasets, which further supports our idea of mediating Contrastive Conflicts.

Via

Access Paper or Ask Questions