Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Baosong Yang

additional authors not shown

Attention Mechanism with Energy-Friendly Operations

Apr 28, 2022

Yu Wan, Baosong Yang, Dayiheng Liu, Rong Xiao, Derek F. Wong, Haibo Zhang, Boxing Chen, Lidia S. Chao

Figure 1 for Attention Mechanism with Energy-Friendly Operations

Figure 2 for Attention Mechanism with Energy-Friendly Operations

Figure 3 for Attention Mechanism with Energy-Friendly Operations

Figure 4 for Attention Mechanism with Energy-Friendly Operations

Abstract:Attention mechanism has become the dominant module in natural language processing models. It is computationally intensive and depends on massive power-hungry multiplications. In this paper, we rethink variants of attention mechanism from the energy consumption aspects. After reaching the conclusion that the energy costs of several energy-friendly operations are far less than their multiplication counterparts, we build a novel attention model by replacing multiplications with either selective operations or additions. Empirical results on three machine translation tasks demonstrate that the proposed model, against the vanilla one, achieves competitable accuracy while saving 99\% and 66\% energy during alignment calculation and the whole attention procedure. Code is available at: https://github.com/NLP2CT/E-Att.

* Findings@ACL2022

Via

Access Paper or Ask Questions

RoBLEURT Submission for the WMT2021 Metrics Task

Apr 28, 2022

Yu Wan, Dayiheng Liu, Baosong Yang, Tianchi Bi, Haibo Zhang, Boxing Chen, Weihua Luo, Derek F. Wong, Lidia S. Chao

Figure 1 for RoBLEURT Submission for the WMT2021 Metrics Task

Figure 2 for RoBLEURT Submission for the WMT2021 Metrics Task

Figure 3 for RoBLEURT Submission for the WMT2021 Metrics Task

Figure 4 for RoBLEURT Submission for the WMT2021 Metrics Task

Abstract:In this paper, we present our submission to Shared Metrics Task: RoBLEURT (Robustly Optimizing the training of BLEURT). After investigating the recent advances of trainable metrics, we conclude several aspects of vital importance to obtain a well-performed metric model by: 1) jointly leveraging the advantages of source-included model and reference-only model, 2) continuously pre-training the model with massive synthetic data pairs, and 3) fine-tuning the model with data denoising strategy. Experimental results show that our model reaching state-of-the-art correlations with the WMT2020 human annotations upon 8 out of 10 to-English language pairs.

* WMT2021 Metrics Shared Task

Via

Access Paper or Ask Questions

UniTE: Unified Translation Evaluation

Apr 28, 2022

Yu Wan, Dayiheng Liu, Baosong Yang, Haibo Zhang, Boxing Chen, Derek F. Wong, Lidia S. Chao

Figure 1 for UniTE: Unified Translation Evaluation

Figure 2 for UniTE: Unified Translation Evaluation

Figure 3 for UniTE: Unified Translation Evaluation

Figure 4 for UniTE: Unified Translation Evaluation

Abstract:Translation quality evaluation plays a crucial role in machine translation. According to the input format, it is mainly separated into three tasks, i.e., reference-only, source-only and source-reference-combined. Recent methods, despite their promising results, are specifically designed and optimized on one of them. This limits the convenience of these methods, and overlooks the commonalities among tasks. In this paper, we propose UniTE, which is the first unified framework engaged with abilities to handle all three evaluation tasks. Concretely, we propose monotonic regional attention to control the interaction among input segments, and unified pretraining to better adapt multi-task learning. We testify our framework on WMT 2019 Metrics and WMT 2020 Quality Estimation benchmarks. Extensive analyses show that our \textit{single model} can universally surpass various state-of-the-art or winner methods across tasks. Both source code and associated models are available at https://github.com/NLP2CT/UniTE.

* ACL2022

Via

Access Paper or Ask Questions

RMBR: A Regularized Minimum Bayes Risk Reranking Framework for Machine Translation

Mar 01, 2022

Yidan Zhang, Yu Wan, Dayiheng Liu, Baosong Yang, Zhenan He

Figure 1 for RMBR: A Regularized Minimum Bayes Risk Reranking Framework for Machine Translation

Figure 2 for RMBR: A Regularized Minimum Bayes Risk Reranking Framework for Machine Translation

Figure 3 for RMBR: A Regularized Minimum Bayes Risk Reranking Framework for Machine Translation

Figure 4 for RMBR: A Regularized Minimum Bayes Risk Reranking Framework for Machine Translation

Abstract:Beam search is the most widely used decoding method for neural machine translation (NMT). In practice, the top-1 candidate with the highest log-probability among the n candidates is selected as the preferred one. However, this top-1 candidate may not be the best overall translation among the n-best list. Recently, Minimum Bayes Risk (MBR) decoding has been proposed to improve the quality for NMT, which seeks for a consensus translation that is closest on average to other candidates from the n-best list. We argue that MBR still suffers from the following problems: The utility function only considers the lexical-level similarity between candidates; The expected utility considers the entire n-best list which is time-consuming and inadequate candidates in the tail list may hurt the performance; Only the relationship between candidates is considered. To solve these issues, we design a regularized MBR reranking framework (RMBR), which considers semantic-based similarity and computes the expected utility for each candidate by truncating the list. We expect the proposed framework to further consider the translation quality and model uncertainty of each candidate. Thus the proposed quality regularizer and uncertainty regularizer are incorporated into the framework. Extensive experiments on multiple translation tasks demonstrate the effectiveness of our method.

* 10 pages

Via

Access Paper or Ask Questions

Frequency-Aware Contrastive Learning for Neural Machine Translation

Dec 29, 2021

Tong Zhang, Wei Ye, Baosong Yang, Long Zhang, Xingzhang Ren, Dayiheng Liu, Jinan Sun, Shikun Zhang, Haibo Zhang, Wen Zhao

Figure 1 for Frequency-Aware Contrastive Learning for Neural Machine Translation

Figure 2 for Frequency-Aware Contrastive Learning for Neural Machine Translation

Figure 3 for Frequency-Aware Contrastive Learning for Neural Machine Translation

Figure 4 for Frequency-Aware Contrastive Learning for Neural Machine Translation

Abstract:Low-frequency word prediction remains a challenge in modern neural machine translation (NMT) systems. Recent adaptive training methods promote the output of infrequent words by emphasizing their weights in the overall training objectives. Despite the improved recall of low-frequency words, their prediction precision is unexpectedly hindered by the adaptive objectives. Inspired by the observation that low-frequency words form a more compact embedding space, we tackle this challenge from a representation learning perspective. Specifically, we propose a frequency-aware token-level contrastive learning method, in which the hidden state of each decoding step is pushed away from the counterparts of other target words, in a soft contrastive way based on the corresponding word frequencies. We conduct experiments on widely used NIST Chinese-English and WMT14 English-German translation tasks. Empirical results show that our proposed methods can not only significantly improve the translation quality but also enhance lexical diversity and optimize word representation space. Further investigation reveals that, comparing with related adaptive training strategies, the superiority of our method on low-frequency word prediction lies in the robustness of token-level recall across different frequencies without sacrificing precision.

* Published at AAAI 2022

Via

Access Paper or Ask Questions

KGR^4: Retrieval, Retrospect, Refine and Rethink for Commonsense Generation

Dec 15, 2021

Xin Liu, Dayiheng Liu, Baosong Yang, Haibo Zhang, Junwei Ding, Wenqing Yao, Weihua Luo, Haiying Zhang, Jinsong Su

Figure 1 for KGR^4: Retrieval, Retrospect, Refine and Rethink for Commonsense Generation

Figure 2 for KGR^4: Retrieval, Retrospect, Refine and Rethink for Commonsense Generation

Figure 3 for KGR^4: Retrieval, Retrospect, Refine and Rethink for Commonsense Generation

Figure 4 for KGR^4: Retrieval, Retrospect, Refine and Rethink for Commonsense Generation

Abstract:Generative commonsense reasoning requires machines to generate sentences describing an everyday scenario given several concepts, which has attracted much attention recently. However, existing models cannot perform as well as humans, since sentences they produce are often implausible and grammatically incorrect. In this paper, inspired by the process of humans creating sentences, we propose a novel Knowledge-enhanced Commonsense Generation framework, termed KGR^4, consisting of four stages: Retrieval, Retrospect, Refine, Rethink. Under this framework, we first perform retrieval to search for relevant sentences from external corpus as the prototypes. Then, we train the generator that either edits or copies these prototypes to generate candidate sentences, of which potential errors will be fixed by an autoencoder-based refiner. Finally, we select the output sentence from candidate sentences produced by generators with different hyper-parameters. Experimental results and in-depth analysis on the CommonGen benchmark strongly demonstrate the effectiveness of our framework. Particularly, KGR^4 obtains 33.56 SPICE points in the official leaderboard, outperforming the previously-reported best result by 2.49 SPICE points and achieving state-of-the-art performance.

* AAAI2022

Via

Access Paper or Ask Questions

Leveraging Advantages of Interactive and Non-Interactive Models for Vector-Based Cross-Lingual Information Retrieval

Nov 03, 2021

Linlong Xu, Baosong Yang, Xiaoyu Lv, Tianchi Bi, Dayiheng Liu, Haibo Zhang

Figure 1 for Leveraging Advantages of Interactive and Non-Interactive Models for Vector-Based Cross-Lingual Information Retrieval

Figure 2 for Leveraging Advantages of Interactive and Non-Interactive Models for Vector-Based Cross-Lingual Information Retrieval

Figure 3 for Leveraging Advantages of Interactive and Non-Interactive Models for Vector-Based Cross-Lingual Information Retrieval

Figure 4 for Leveraging Advantages of Interactive and Non-Interactive Models for Vector-Based Cross-Lingual Information Retrieval

Abstract:Interactive and non-interactive model are the two de-facto standard frameworks in vector-based cross-lingual information retrieval (V-CLIR), which embed queries and documents in synchronous and asynchronous fashions, respectively. From the retrieval accuracy and computational efficiency perspectives, each model has its own superiority and shortcoming. In this paper, we propose a novel framework to leverage the advantages of these two paradigms. Concretely, we introduce semi-interactive mechanism, which builds our model upon non-interactive architecture but encodes each document together with its associated multilingual queries. Accordingly, cross-lingual features can be better learned like an interactive model. Besides, we further transfer knowledge from a well-trained interactive model to ours by reusing its word embeddings and adopting knowledge distillation. Our model is initialized from a multilingual pre-trained language model M-BERT, and evaluated on two open-resource CLIR datasets derived from Wikipedia and an in-house dataset collected from a real-world search engine. Extensive analyses reveal that our methods significantly boost the retrieval accuracy while maintaining the computational efficiency.

Via

Access Paper or Ask Questions

Towards User-Driven Neural Machine Translation

Jun 11, 2021

Huan Lin, Liang Yao, Baosong Yang, Dayiheng Liu, Haibo Zhang, Weihua Luo, Degen Huang, Jinsong Su

Figure 1 for Towards User-Driven Neural Machine Translation

Figure 2 for Towards User-Driven Neural Machine Translation

Figure 3 for Towards User-Driven Neural Machine Translation

Figure 4 for Towards User-Driven Neural Machine Translation

Abstract:A good translation should not only translate the original content semantically, but also incarnate personal traits of the original text. For a real-world neural machine translation (NMT) system, these user traits (e.g., topic preference, stylistic characteristics and expression habits) can be preserved in user behavior (e.g., historical inputs). However, current NMT systems marginally consider the user behavior due to: 1) the difficulty of modeling user portraits in zero-shot scenarios, and 2) the lack of user-behavior annotated parallel dataset. To fill this gap, we introduce a novel framework called user-driven NMT. Specifically, a cache-based module and a user-driven contrastive learning method are proposed to offer NMT the ability to capture potential user traits from their historical inputs under a zero-shot learning fashion. Furthermore, we contribute the first Chinese-English parallel corpus annotated with user behavior called UDT-Corpus. Experimental results confirm that the proposed user-driven NMT can generate user-specific translations.

Via

Access Paper or Ask Questions

Bridging Subword Gaps in Pretrain-Finetune Paradigm for Natural Language Generation

Jun 11, 2021

Xin Liu, Baosong Yang, Dayiheng Liu, Haibo Zhang, Weihua Luo, Min Zhang, Haiying Zhang, Jinsong Su

Figure 1 for Bridging Subword Gaps in Pretrain-Finetune Paradigm for Natural Language Generation

Figure 2 for Bridging Subword Gaps in Pretrain-Finetune Paradigm for Natural Language Generation

Figure 3 for Bridging Subword Gaps in Pretrain-Finetune Paradigm for Natural Language Generation

Figure 4 for Bridging Subword Gaps in Pretrain-Finetune Paradigm for Natural Language Generation

Abstract:A well-known limitation in pretrain-finetune paradigm lies in its inflexibility caused by the one-size-fits-all vocabulary. This potentially weakens the effect when applying pretrained models into natural language generation (NLG) tasks, especially for the subword distributions between upstream and downstream tasks with significant discrepancy. Towards approaching this problem, we extend the vanilla pretrain-finetune pipeline with an extra embedding transfer step. Specifically, a plug-and-play embedding generator is introduced to produce the representation of any input token, according to pre-trained embeddings of its morphologically similar ones. Thus, embeddings of mismatch tokens in downstream tasks can also be efficiently initialized. We conduct experiments on a variety of NLG tasks under the pretrain-finetune fashion. Experimental results and extensive analyses show that the proposed strategy offers us opportunities to feel free to transfer the vocabulary, leading to more efficient and better performed downstream NLG models.

* Accepted by ACL2021

Via

Access Paper or Ask Questions

Exploiting Neural Query Translation into Cross Lingual Information Retrieval

Oct 26, 2020

Liang Yao, Baosong Yang, Haibo Zhang, Weihua Luo, Boxing Chen

Figure 1 for Exploiting Neural Query Translation into Cross Lingual Information Retrieval

Figure 2 for Exploiting Neural Query Translation into Cross Lingual Information Retrieval

Figure 3 for Exploiting Neural Query Translation into Cross Lingual Information Retrieval

Figure 4 for Exploiting Neural Query Translation into Cross Lingual Information Retrieval

Abstract:As a crucial role in cross-language information retrieval (CLIR), query translation has three main challenges: 1) the adequacy of translation; 2) the lack of in-domain parallel training data; and 3) the requisite of low latency. To this end, existing CLIR systems mainly exploit statistical-based machine translation (SMT) rather than the advanced neural machine translation (NMT), limiting the further improvements on both translation and retrieval quality. In this paper, we investigate how to exploit neural query translation model into CLIR system. Specifically, we propose a novel data augmentation method that extracts query translation pairs according to user clickthrough data, thus to alleviate the problem of domain-adaptation in NMT. Then, we introduce an asynchronous strategy which is able to leverage the advantages of the real-time in SMT and the veracity in NMT. Experimental results reveal that the proposed approach yields better retrieval quality than strong baselines and can be well applied into a real-world CLIR system, i.e. Aliexpress e-Commerce search engine. Readers can examine and test their cases on our website: https://aliexpress.com .

* SIGIR eCom 2020

Via

Access Paper or Ask Questions