Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Jinsong Su

RC3: Regularized Contrastive Cross-lingual Cross-modal Pre-training

May 13, 2023

Chulun Zhou, Yunlong Liang, Fandong Meng, Jinan Xu, Jinsong Su, Jie Zhou

Figure 1 for RC3: Regularized Contrastive Cross-lingual Cross-modal Pre-training

Figure 2 for RC3: Regularized Contrastive Cross-lingual Cross-modal Pre-training

Figure 3 for RC3: Regularized Contrastive Cross-lingual Cross-modal Pre-training

Figure 4 for RC3: Regularized Contrastive Cross-lingual Cross-modal Pre-training

Abstract:Multilingual vision-language (V&L) pre-training has achieved remarkable progress in learning universal representations across different modalities and languages. In spite of recent success, there still remain challenges limiting further improvements of V&L pre-trained models in multilingual settings. Particularly, current V&L pre-training methods rely heavily on strictly-aligned multilingual image-text pairs generated from English-centric datasets through machine translation. However, the cost of collecting and translating such strictly-aligned datasets is usually unbearable. In this paper, we propose Regularized Contrastive Cross-lingual Cross-modal (RC^3) pre-training, which further exploits more abundant weakly-aligned multilingual image-text pairs. Specifically, we design a regularized cross-lingual visio-textual contrastive learning objective that constrains the representation proximity of weakly-aligned visio-textual inputs according to textual relevance. Besides, existing V&L pre-training approaches mainly deal with visual inputs by either region-of-interest (ROI) features or patch embeddings. We flexibly integrate the two forms of visual features into our model for pre-training and downstream multi-modal tasks. Extensive experiments on 5 downstream multi-modal tasks across 6 languages demonstrate the effectiveness of our proposed method over competitive contrast models with stronger zero-shot capability.

* accepted to ACL 2023 Findings

Via

Access Paper or Ask Questions

From Statistical Methods to Deep Learning, Automatic Keyphrase Prediction: A Survey

May 04, 2023

Binbin Xie, Jia Song, Liangying Shao, Suhang Wu, Xiangpeng Wei, Baosong Yang, Huan Lin, Jun Xie, Jinsong Su

Figure 1 for From Statistical Methods to Deep Learning, Automatic Keyphrase Prediction: A Survey

Figure 2 for From Statistical Methods to Deep Learning, Automatic Keyphrase Prediction: A Survey

Figure 3 for From Statistical Methods to Deep Learning, Automatic Keyphrase Prediction: A Survey

Figure 4 for From Statistical Methods to Deep Learning, Automatic Keyphrase Prediction: A Survey

Abstract:Keyphrase prediction aims to generate phrases (keyphrases) that highly summarizes a given document. Recently, researchers have conducted in-depth studies on this task from various perspectives. In this paper, we comprehensively summarize representative studies from the perspectives of dominant models, datasets and evaluation metrics. Our work analyzes up to 167 previous works, achieving greater coverage of this task than previous surveys. Particularly, we focus highly on deep learning-based keyphrase prediction, which attracts increasing attention of this task in recent years. Afterwards, we conduct several groups of experiments to carefully compare representative models. To the best of our knowledge, our work is the first attempt to compare these models using the identical commonly-used datasets and evaluation metric, facilitating in-depth analyses of their disadvantages and advantages. Finally, we discuss the possible research directions of this task in the future.

* Information Processing & Management

Via

Access Paper or Ask Questions

Search-Engine-augmented Dialogue Response Generation with Cheaply Supervised Query Production

Feb 16, 2023

Ante Wang, Linfeng Song, Qi Liu, Haitao Mi, Longyue Wang, Zhaopeng Tu, Jinsong Su, Dong Yu

Abstract:Knowledge-aided dialogue response generation aims at augmenting chatbots with relevant external knowledge in the hope of generating more informative responses. The majority of previous work assumes that the relevant knowledge is given as input or retrieved from a static pool of knowledge. However, this assumption violates the real-world situation, where knowledge is continually updated and a chatbot has to dynamically retrieve useful knowledge. We propose a dialogue model that can access the vast and dynamic information from any search engine for response generation. As the core module, a query producer is used to generate queries from a dialogue context to interact with a search engine. We design a training algorithm using cheap noisy supervision for the query producer, where the signals are obtained by comparing retrieved articles with the next dialogue response. As the result, the query producer is adjusted without any human annotation of gold queries, making it easily transferable to other domains and search engines. Experiments show that our query producer can achieve R@1 and R@5 rates of 62.4% and 74.8% for retrieving gold knowledge, and the overall model generates better responses over strong knowledge-aided baselines using BART and other typical systems.

* Artificial Intelligence 2023

Via

Access Paper or Ask Questions

A Multi-task Multi-stage Transitional Training Framework for Neural Chat Translation

Jan 27, 2023

Chulun Zhou, Yunlong Liang, Fandong Meng, Jie Zhou, Jinan Xu, Hongji Wang, Min Zhang, Jinsong Su

Abstract:Neural chat translation (NCT) aims to translate a cross-lingual chat between speakers of different languages. Existing context-aware NMT models cannot achieve satisfactory performances due to the following inherent problems: 1) limited resources of annotated bilingual dialogues; 2) the neglect of modelling conversational properties; 3) training discrepancy between different stages. To address these issues, in this paper, we propose a multi-task multi-stage transitional (MMT) training framework, where an NCT model is trained using the bilingual chat translation dataset and additional monolingual dialogues. We elaborately design two auxiliary tasks, namely utterance discrimination and speaker discrimination, to introduce the modelling of dialogue coherence and speaker characteristic into the NCT model. The training process consists of three stages: 1) sentence-level pre-training on large-scale parallel corpus; 2) intermediate training with auxiliary tasks using additional monolingual dialogues; 3) context-aware fine-tuning with gradual transition. Particularly, the second stage serves as an intermediate phase that alleviates the training discrepancy between the pre-training and fine-tuning stages. Moreover, to make the stage transition smoother, we train the NCT model using a gradual transition strategy, i.e., gradually transiting from using monolingual to bilingual dialogues. Extensive experiments on two language pairs demonstrate the effectiveness and superiority of our proposed training framework.

* 16 pages, IEEE Transactions on Pattern Analysis and Machine Intelligence

Via

Access Paper or Ask Questions

Towards Better Document-level Relation Extraction via Iterative Inference

Nov 26, 2022

Liang Zhang, Jinsong Su, Yidong Chen, Zhongjian Miao, Zijun Min, Qingguo Hu, Xiaodong Shi

Abstract:Document-level relation extraction (RE) aims to extract the relations between entities from the input document that usually containing many difficultly-predicted entity pairs whose relations can only be predicted through relational inference. Existing methods usually directly predict the relations of all entity pairs of input document in a one-pass manner, ignoring the fact that predictions of some entity pairs heavily depend on the predicted results of other pairs. To deal with this issue, in this paper, we propose a novel document-level RE model with iterative inference. Our model is mainly composed of two modules: 1) a base module expected to provide preliminary relation predictions on entity pairs; 2) an inference module introduced to refine these preliminary predictions by iteratively dealing with difficultly-predicted entity pairs depending on other pairs in an easy-to-hard manner. Unlike previous methods which only consider feature information of entity pairs, our inference module is equipped with two Extended Cross Attention units, allowing it to exploit both feature information and previous predictions of entity pairs during relational inference. Furthermore, we adopt a two-stage strategy to train our model. At the first stage, we only train our base module. During the second stage, we train the whole model, where contrastive learning is introduced to enhance the training of inference module. Experimental results on three commonly-used datasets show that our model consistently outperforms other competitive baselines.

* Accepted by EMNLP 2022 (long paper)

Via

Access Paper or Ask Questions

WR-ONE2SET: Towards Well-Calibrated Keyphrase Generation

Nov 13, 2022

Binbin Xie, Xiangpeng Wei, Baosong Yang, Huan Lin, Jun Xie, Xiaoli Wang, Min Zhang, Jinsong Su

Abstract:Keyphrase generation aims to automatically generate short phrases summarizing an input document. The recently emerged ONE2SET paradigm (Ye et al., 2021) generates keyphrases as a set and has achieved competitive performance. Nevertheless, we observe serious calibration errors outputted by ONE2SET, especially in the over-estimation of $\varnothing$ token (means "no corresponding keyphrase"). In this paper, we deeply analyze this limitation and identify two main reasons behind: 1) the parallel generation has to introduce excessive $\varnothing$ as padding tokens into training instances; and 2) the training mechanism assigning target to each slot is unstable and further aggravates the $\varnothing$ token over-estimation. To make the model well-calibrated, we propose WR-ONE2SET which extends ONE2SET with an adaptive instance-level cost Weighting strategy and a target Re-assignment mechanism. The former dynamically penalizes the over-estimated slots for different instances thus smoothing the uneven training distribution. The latter refines the original inappropriate assignment and reduces the supervisory signals of over-estimated slots. Experimental results on commonly-used datasets demonstrate the effectiveness and generality of our proposed paradigm.

* EMNLP2022

Via

Access Paper or Ask Questions

Getting the Most out of Simile Recognition

Nov 11, 2022

Xiaoyue Wang, Linfeng Song, Xin Liu, Chulun Zhou, Jinsong Su

Abstract:Simile recognition involves two subtasks: simile sentence classification that discriminates whether a sentence contains simile, and simile component extraction that locates the corresponding objects (i.e., tenors and vehicles). Recent work ignores features other than surface strings. In this paper, we explore expressive features for this task to achieve more effective data utilization. Particularly, we study two types of features: 1) input-side features that include POS tags, dependency trees and word definitions, and 2) decoding features that capture the interdependence among various decoding decisions. We further construct a model named HGSR, which merges the input-side features as a heterogeneous graph and leverages decoding features via distillation. Experiments show that HGSR significantly outperforms the current state-of-the-art systems and carefully designed baselines, verifying the effectiveness of introduced features. Our code is available at https://github.com/DeepLearnXMU/HGSR.

* Findings of EMNLP2022

Via

Access Paper or Ask Questions

Sentiment-Aware Word and Sentence Level Pre-training for Sentiment Analysis

Oct 19, 2022

Shuai Fan, Chen Lin, Haonan Li, Zhenghao Lin, Jinsong Su, Hang Zhang, Yeyun Gong, Jian Guo, Nan Duan

Figure 1 for Sentiment-Aware Word and Sentence Level Pre-training for Sentiment Analysis

Figure 2 for Sentiment-Aware Word and Sentence Level Pre-training for Sentiment Analysis

Figure 3 for Sentiment-Aware Word and Sentence Level Pre-training for Sentiment Analysis

Figure 4 for Sentiment-Aware Word and Sentence Level Pre-training for Sentiment Analysis

Abstract:Most existing pre-trained language representation models (PLMs) are sub-optimal in sentiment analysis tasks, as they capture the sentiment information from word-level while under-considering sentence-level information. In this paper, we propose SentiWSP, a novel Sentiment-aware pre-trained language model with combined Word-level and Sentence-level Pre-training tasks. The word level pre-training task detects replaced sentiment words, via a generator-discriminator framework, to enhance the PLM's knowledge about sentiment words. The sentence level pre-training task further strengthens the discriminator via a contrastive learning framework, with similar sentences as negative samples, to encode sentiments in a sentence. Extensive experimental results show that SentiWSP achieves new state-of-the-art performance on various sentence-level and aspect-level sentiment classification benchmarks. We have made our code and model publicly available at https://github.com/XMUDM/SentiWSP.

* Accepted to EMNLP 2022

Via

Access Paper or Ask Questions

Towards Robust k-Nearest-Neighbor Machine Translation

Oct 17, 2022

Hui Jiang, Ziyao Lu, Fandong Meng, Chulun Zhou, Jie Zhou, Degen Huang, Jinsong Su

Figure 1 for Towards Robust k-Nearest-Neighbor Machine Translation

Figure 2 for Towards Robust k-Nearest-Neighbor Machine Translation

Figure 3 for Towards Robust k-Nearest-Neighbor Machine Translation

Figure 4 for Towards Robust k-Nearest-Neighbor Machine Translation

Abstract:k-Nearest-Neighbor Machine Translation (kNN-MT) becomes an important research direction of NMT in recent years. Its main idea is to retrieve useful key-value pairs from an additional datastore to modify translations without updating the NMT model. However, the underlying retrieved noisy pairs will dramatically deteriorate the model performance. In this paper, we conduct a preliminary study and find that this problem results from not fully exploiting the prediction of the NMT model. To alleviate the impact of noise, we propose a confidence-enhanced kNN-MT model with robust training. Concretely, we introduce the NMT confidence to refine the modeling of two important components of kNN-MT: kNN distribution and the interpolation weight. Meanwhile we inject two types of perturbations into the retrieved pairs for robust training. Experimental results on four benchmark datasets demonstrate that our model not only achieves significant improvements over current kNN-MT models, but also exhibits better robustness. Our code is available at https://github.com/DeepLearnXMU/Robust-knn-mt.

* Accepted to EMNLP 2022

Via

Access Paper or Ask Questions

A Variational Hierarchical Model for Neural Cross-Lingual Summarization

Mar 25, 2022

Yunlong Liang, Fandong Meng, Chulun Zhou, Jinan Xu, Yufeng Chen, Jinsong Su, Jie Zhou

Figure 1 for A Variational Hierarchical Model for Neural Cross-Lingual Summarization

Figure 2 for A Variational Hierarchical Model for Neural Cross-Lingual Summarization

Figure 3 for A Variational Hierarchical Model for Neural Cross-Lingual Summarization

Figure 4 for A Variational Hierarchical Model for Neural Cross-Lingual Summarization

Abstract:The goal of the cross-lingual summarization (CLS) is to convert a document in one language (e.g., English) to a summary in another one (e.g., Chinese). Essentially, the CLS task is the combination of machine translation (MT) and monolingual summarization (MS), and thus there exists the hierarchical relationship between MT\&MS and CLS. Existing studies on CLS mainly focus on utilizing pipeline methods or jointly training an end-to-end model through an auxiliary MT or MS objective. However, it is very challenging for the model to directly conduct CLS as it requires both the abilities to translate and summarize. To address this issue, we propose a hierarchical model for the CLS task, based on the conditional variational auto-encoder. The hierarchical model contains two kinds of latent variables at the local and global levels, respectively. At the local level, there are two latent variables, one for translation and the other for summarization. As for the global level, there is another latent variable for cross-lingual summarization conditioned on the two local-level variables. Experiments on two language directions (English-Chinese) verify the effectiveness and superiority of the proposed approach. In addition, we show that our model is able to generate better cross-lingual summaries than comparison models in the few-shot setting.

* Accepted at ACL 2022 as a long paper of main conference. Code: https://github.com/XL2248/VHM

Via

Access Paper or Ask Questions