Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Jinsong Su

Getting the Most out of Simile Recognition

Nov 11, 2022

Xiaoyue Wang, Linfeng Song, Xin Liu, Chulun Zhou, Jinsong Su

Figure 1 for Getting the Most out of Simile Recognition

Figure 2 for Getting the Most out of Simile Recognition

Figure 3 for Getting the Most out of Simile Recognition

Figure 4 for Getting the Most out of Simile Recognition

Abstract:Simile recognition involves two subtasks: simile sentence classification that discriminates whether a sentence contains simile, and simile component extraction that locates the corresponding objects (i.e., tenors and vehicles). Recent work ignores features other than surface strings. In this paper, we explore expressive features for this task to achieve more effective data utilization. Particularly, we study two types of features: 1) input-side features that include POS tags, dependency trees and word definitions, and 2) decoding features that capture the interdependence among various decoding decisions. We further construct a model named HGSR, which merges the input-side features as a heterogeneous graph and leverages decoding features via distillation. Experiments show that HGSR significantly outperforms the current state-of-the-art systems and carefully designed baselines, verifying the effectiveness of introduced features. Our code is available at https://github.com/DeepLearnXMU/HGSR.

* Findings of EMNLP2022

Via

Access Paper or Ask Questions

Sentiment-Aware Word and Sentence Level Pre-training for Sentiment Analysis

Oct 19, 2022

Shuai Fan, Chen Lin, Haonan Li, Zhenghao Lin, Jinsong Su, Hang Zhang, Yeyun Gong, Jian Guo, Nan Duan

Figure 1 for Sentiment-Aware Word and Sentence Level Pre-training for Sentiment Analysis

Figure 2 for Sentiment-Aware Word and Sentence Level Pre-training for Sentiment Analysis

Figure 3 for Sentiment-Aware Word and Sentence Level Pre-training for Sentiment Analysis

Figure 4 for Sentiment-Aware Word and Sentence Level Pre-training for Sentiment Analysis

Abstract:Most existing pre-trained language representation models (PLMs) are sub-optimal in sentiment analysis tasks, as they capture the sentiment information from word-level while under-considering sentence-level information. In this paper, we propose SentiWSP, a novel Sentiment-aware pre-trained language model with combined Word-level and Sentence-level Pre-training tasks. The word level pre-training task detects replaced sentiment words, via a generator-discriminator framework, to enhance the PLM's knowledge about sentiment words. The sentence level pre-training task further strengthens the discriminator via a contrastive learning framework, with similar sentences as negative samples, to encode sentiments in a sentence. Extensive experimental results show that SentiWSP achieves new state-of-the-art performance on various sentence-level and aspect-level sentiment classification benchmarks. We have made our code and model publicly available at https://github.com/XMUDM/SentiWSP.

* Accepted to EMNLP 2022

Via

Access Paper or Ask Questions

Towards Robust k-Nearest-Neighbor Machine Translation

Oct 17, 2022

Hui Jiang, Ziyao Lu, Fandong Meng, Chulun Zhou, Jie Zhou, Degen Huang, Jinsong Su

Figure 1 for Towards Robust k-Nearest-Neighbor Machine Translation

Figure 2 for Towards Robust k-Nearest-Neighbor Machine Translation

Figure 3 for Towards Robust k-Nearest-Neighbor Machine Translation

Figure 4 for Towards Robust k-Nearest-Neighbor Machine Translation

Abstract:k-Nearest-Neighbor Machine Translation (kNN-MT) becomes an important research direction of NMT in recent years. Its main idea is to retrieve useful key-value pairs from an additional datastore to modify translations without updating the NMT model. However, the underlying retrieved noisy pairs will dramatically deteriorate the model performance. In this paper, we conduct a preliminary study and find that this problem results from not fully exploiting the prediction of the NMT model. To alleviate the impact of noise, we propose a confidence-enhanced kNN-MT model with robust training. Concretely, we introduce the NMT confidence to refine the modeling of two important components of kNN-MT: kNN distribution and the interpolation weight. Meanwhile we inject two types of perturbations into the retrieved pairs for robust training. Experimental results on four benchmark datasets demonstrate that our model not only achieves significant improvements over current kNN-MT models, but also exhibits better robustness. Our code is available at https://github.com/DeepLearnXMU/Robust-knn-mt.

* Accepted to EMNLP 2022

Via

Access Paper or Ask Questions

A Variational Hierarchical Model for Neural Cross-Lingual Summarization

Mar 25, 2022

Yunlong Liang, Fandong Meng, Chulun Zhou, Jinan Xu, Yufeng Chen, Jinsong Su, Jie Zhou

Figure 1 for A Variational Hierarchical Model for Neural Cross-Lingual Summarization

Figure 2 for A Variational Hierarchical Model for Neural Cross-Lingual Summarization

Figure 3 for A Variational Hierarchical Model for Neural Cross-Lingual Summarization

Figure 4 for A Variational Hierarchical Model for Neural Cross-Lingual Summarization

Abstract:The goal of the cross-lingual summarization (CLS) is to convert a document in one language (e.g., English) to a summary in another one (e.g., Chinese). Essentially, the CLS task is the combination of machine translation (MT) and monolingual summarization (MS), and thus there exists the hierarchical relationship between MT\&MS and CLS. Existing studies on CLS mainly focus on utilizing pipeline methods or jointly training an end-to-end model through an auxiliary MT or MS objective. However, it is very challenging for the model to directly conduct CLS as it requires both the abilities to translate and summarize. To address this issue, we propose a hierarchical model for the CLS task, based on the conditional variational auto-encoder. The hierarchical model contains two kinds of latent variables at the local and global levels, respectively. At the local level, there are two latent variables, one for translation and the other for summarization. As for the global level, there is another latent variable for cross-lingual summarization conditioned on the two local-level variables. Experiments on two language directions (English-Chinese) verify the effectiveness and superiority of the proposed approach. In addition, we show that our model is able to generate better cross-lingual summaries than comparison models in the few-shot setting.

* Accepted at ACL 2022 as a long paper of main conference. Code: https://github.com/XL2248/VHM

Via

Access Paper or Ask Questions

Confidence Based Bidirectional Global Context Aware Training Framework for Neural Machine Translation

Mar 17, 2022

Chulun Zhou, Fandong Meng, Jie Zhou, Min Zhang, Hongji Wang, Jinsong Su

Figure 1 for Confidence Based Bidirectional Global Context Aware Training Framework for Neural Machine Translation

Figure 2 for Confidence Based Bidirectional Global Context Aware Training Framework for Neural Machine Translation

Figure 3 for Confidence Based Bidirectional Global Context Aware Training Framework for Neural Machine Translation

Figure 4 for Confidence Based Bidirectional Global Context Aware Training Framework for Neural Machine Translation

Abstract:Most dominant neural machine translation (NMT) models are restricted to make predictions only according to the local context of preceding words in a left-to-right manner. Although many previous studies try to incorporate global information into NMT models, there still exist limitations on how to effectively exploit bidirectional global context. In this paper, we propose a Confidence Based Bidirectional Global Context Aware (CBBGCA) training framework for NMT, where the NMT model is jointly trained with an auxiliary conditional masked language model (CMLM). The training consists of two stages: (1) multi-task joint training; (2) confidence based knowledge distillation. At the first stage, by sharing encoder parameters, the NMT model is additionally supervised by the signal from the CMLM decoder that contains bidirectional global contexts. Moreover, at the second stage, using the CMLM as teacher, we further pertinently incorporate bidirectional global context to the NMT model on its unconfidently-predicted target words via knowledge distillation. Experimental results show that our proposed CBBGCA training framework significantly improves the NMT model by +1.02, +1.30 and +0.57 BLEU scores on three large-scale translation datasets, namely WMT'14 English-to-German, WMT'19 Chinese-to-English and WMT'14 English-to-French, respectively.

* Pre-print version; Accepted at ACL 2022 as a long paper of main conference

Via

Access Paper or Ask Questions

Type-Driven Multi-Turn Corrections for Grammatical Error Correction

Mar 17, 2022

Shaopeng Lai, Qingyu Zhou, Jiali Zeng, Zhongli Li, Chao Li, Yunbo Cao, Jinsong Su

Figure 1 for Type-Driven Multi-Turn Corrections for Grammatical Error Correction

Figure 2 for Type-Driven Multi-Turn Corrections for Grammatical Error Correction

Figure 3 for Type-Driven Multi-Turn Corrections for Grammatical Error Correction

Figure 4 for Type-Driven Multi-Turn Corrections for Grammatical Error Correction

Abstract:Grammatical Error Correction (GEC) aims to automatically detect and correct grammatical errors. In this aspect, dominant models are trained by one-iteration learning while performing multiple iterations of corrections during inference. Previous studies mainly focus on the data augmentation approach to combat the exposure bias, which suffers from two drawbacks. First, they simply mix additionally-constructed training instances and original ones to train models, which fails to help models be explicitly aware of the procedure of gradual corrections. Second, they ignore the interdependence between different types of corrections. In this paper, we propose a Type-Driven Multi-Turn Corrections approach for GEC. Using this approach, from each training instance, we additionally construct multiple training instances, each of which involves the correction of a specific type of errors. Then, we use these additionally-constructed training instances and the original one to train the model in turn. Experimental results and in-depth analysis show that our approach significantly benefits the model training. Particularly, our enhanced model achieves state-of-the-art single-model performance on English GEC benchmarks. We release our code at Github.

* Findings of ACL2022

Via

Access Paper or Ask Questions

A Label Dependence-aware Sequence Generation Model for Multi-level Implicit Discourse Relation Recognition

Dec 22, 2021

Changxing Wu, Liuwen Cao, Yubin Ge, Yang Liu, Min Zhang, Jinsong Su

Figure 1 for A Label Dependence-aware Sequence Generation Model for Multi-level Implicit Discourse Relation Recognition

Figure 2 for A Label Dependence-aware Sequence Generation Model for Multi-level Implicit Discourse Relation Recognition

Figure 3 for A Label Dependence-aware Sequence Generation Model for Multi-level Implicit Discourse Relation Recognition

Figure 4 for A Label Dependence-aware Sequence Generation Model for Multi-level Implicit Discourse Relation Recognition

Abstract:Implicit discourse relation recognition (IDRR) is a challenging but crucial task in discourse analysis. Most existing methods train multiple models to predict multi-level labels independently, while ignoring the dependence between hierarchically structured labels. In this paper, we consider multi-level IDRR as a conditional label sequence generation task and propose a Label Dependence-aware Sequence Generation Model (LDSGM) for it. Specifically, we first design a label attentive encoder to learn the global representation of an input instance and its level-specific contexts, where the label dependence is integrated to obtain better label embeddings. Then, we employ a label sequence decoder to output the predicted labels in a top-down manner, where the predicted higher-level labels are directly used to guide the label prediction at the current level. We further develop a mutual learning enhanced training method to exploit the label dependence in a bottomup direction, which is captured by an auxiliary decoder introduced during training. Experimental results on the PDTB dataset show that our model achieves the state-of-the-art performance on multi-level IDRR. We will release our code at https://github.com/nlpersECJTU/LDSGM.

* Accepted at AAAI 2022

Via

Access Paper or Ask Questions

KGR^4: Retrieval, Retrospect, Refine and Rethink for Commonsense Generation

Dec 15, 2021

Xin Liu, Dayiheng Liu, Baosong Yang, Haibo Zhang, Junwei Ding, Wenqing Yao, Weihua Luo, Haiying Zhang, Jinsong Su

Figure 1 for KGR^4: Retrieval, Retrospect, Refine and Rethink for Commonsense Generation

Figure 2 for KGR^4: Retrieval, Retrospect, Refine and Rethink for Commonsense Generation

Figure 3 for KGR^4: Retrieval, Retrospect, Refine and Rethink for Commonsense Generation

Figure 4 for KGR^4: Retrieval, Retrospect, Refine and Rethink for Commonsense Generation

Abstract:Generative commonsense reasoning requires machines to generate sentences describing an everyday scenario given several concepts, which has attracted much attention recently. However, existing models cannot perform as well as humans, since sentences they produce are often implausible and grammatically incorrect. In this paper, inspired by the process of humans creating sentences, we propose a novel Knowledge-enhanced Commonsense Generation framework, termed KGR^4, consisting of four stages: Retrieval, Retrospect, Refine, Rethink. Under this framework, we first perform retrieval to search for relevant sentences from external corpus as the prototypes. Then, we train the generator that either edits or copies these prototypes to generate candidate sentences, of which potential errors will be fixed by an autoencoder-based refiner. Finally, we select the output sentence from candidate sentences produced by generators with different hyper-parameters. Experimental results and in-depth analysis on the CommonGen benchmark strongly demonstrate the effectiveness of our framework. Particularly, KGR^4 obtains 33.56 SPICE points in the official leaderboard, outperforming the previously-reported best result by 2.49 SPICE points and achieving state-of-the-art performance.

* AAAI2022

Via

Access Paper or Ask Questions

Improving Graph-based Sentence Ordering with Iteratively Predicted Pairwise Orderings

Oct 13, 2021

Shaopeng Lai, Ante Wang, Fandong Meng, Jie Zhou, Yubin Ge, Jiali Zeng, Junfeng Yao, Degen Huang, Jinsong Su

Figure 1 for Improving Graph-based Sentence Ordering with Iteratively Predicted Pairwise Orderings

Figure 2 for Improving Graph-based Sentence Ordering with Iteratively Predicted Pairwise Orderings

Figure 3 for Improving Graph-based Sentence Ordering with Iteratively Predicted Pairwise Orderings

Figure 4 for Improving Graph-based Sentence Ordering with Iteratively Predicted Pairwise Orderings

Abstract:Dominant sentence ordering models can be classified into pairwise ordering models and set-to-sequence models. However, there is little attempt to combine these two types of models, which inituitively possess complementary advantages. In this paper, we propose a novel sentence ordering framework which introduces two classifiers to make better use of pairwise orderings for graph-based sentence ordering. Specially, given an initial sentence-entity graph, we first introduce a graph-based classifier to predict pairwise orderings between linked sentences. Then, in an iterative manner, based on the graph updated by previously predicted high-confident pairwise orderings, another classifier is used to predict the remaining uncertain pairwise orderings. At last, we adapt a GRN-based sentence ordering model on the basis of final graph. Experiments on five commonly-used datasets demonstrate the effectiveness and generality of our model. Particularly, when equipped with BERT and FHDecoder, our model achieves state-of-the-art performance.

* EMNLP 2021

Via

Access Paper or Ask Questions

Controllable Dialogue Generation with Disentangled Multi-grained Style Specification and Attribute Consistency Reward

Sep 14, 2021

Zhe Hu, Zhiwei Cao, Hou Pong Chan, Jiachen Liu, Xinyan Xiao, Jinsong Su, Hua Wu

Figure 1 for Controllable Dialogue Generation with Disentangled Multi-grained Style Specification and Attribute Consistency Reward

Figure 2 for Controllable Dialogue Generation with Disentangled Multi-grained Style Specification and Attribute Consistency Reward

Figure 3 for Controllable Dialogue Generation with Disentangled Multi-grained Style Specification and Attribute Consistency Reward

Figure 4 for Controllable Dialogue Generation with Disentangled Multi-grained Style Specification and Attribute Consistency Reward

Abstract:Controllable text generation is an appealing but challenging task, which allows users to specify particular attributes of the generated outputs. In this paper, we propose a controllable dialogue generation model to steer response generation under multi-attribute constraints. Specifically, we define and categorize the commonly used control attributes into global and local ones, which possess different granularities of effects on response generation. Then, we significantly extend the conventional seq2seq framework by introducing a novel two-stage decoder, which first uses a multi-grained style specification layer to impose the stylistic constraints and determine word-level control states of responses based on the attributes, and then employs a response generation layer to generate final responses maintaining both semantic relevancy to the contexts and fidelity to the attributes. Furthermore, we train our model with an attribute consistency reward to promote response control with explicit supervision signals. Extensive experiments and in-depth analyses on two datasets indicate that our model can significantly outperform competitive baselines in terms of response quality, content diversity and controllability.

Via

Access Paper or Ask Questions