Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Zaixiang Zheng

Structure-informed Language Models Are Protein Designers

Feb 09, 2023

Zaixiang Zheng, Yifan Deng, Dongyu Xue, Yi Zhou, Fei YE, Quanquan Gu

Abstract:This paper demonstrates that language models are strong structure-based protein designers. We present LM-Design, a generic approach to reprogramming sequence-based protein language models (pLMs), that have learned massive sequential evolutionary knowledge from the universe of natural protein sequences, to acquire an immediate capability to design preferable protein sequences for given folds. We conduct a structural surgery on pLMs, where a lightweight structural adapter is implanted into pLMs and endows it with structural awareness. During inference, iterative refinement is performed to effectively optimize the generated protein sequences. Experiments show that LM-Design improves the state-of-the-art results by a large margin, leading to up to 4% to 12% accuracy gains in sequence recovery (e.g., 55.65%/56.63% on CATH 4.2/4.3 single-chain benchmarks, and >60% when designing protein complexes). We provide extensive and in-depth analyses, which verify that LM-Design can (1) indeed leverage both structural and sequential knowledge to accurately handle structurally non-deterministic regions, (2) benefit from scaling data and model size, and (3) generalize to other proteins (e.g., antibodies and de novo proteins)

* 10 pages; ver.2 update: added image credit to RFdiffusion (Watson et al., 2022) in Fig. 1F, and fixed some small presentation errors

Via

Access Paper or Ask Questions

Helping the Weak Makes You Strong: Simple Multi-Task Learning Improves Non-Autoregressive Translators

Nov 11, 2022

Xinyou Wang, Zaixiang Zheng, Shujian Huang

Abstract:Recently, non-autoregressive (NAR) neural machine translation models have received increasing attention due to their efficient parallel decoding. However, the probabilistic framework of NAR models necessitates conditional independence assumption on target sequences, falling short of characterizing human language data. This drawback results in less informative learning signals for NAR models under conventional MLE training, thereby yielding unsatisfactory accuracy compared to their autoregressive (AR) counterparts. In this paper, we propose a simple and model-agnostic multi-task learning framework to provide more informative learning signals. During training stage, we introduce a set of sufficiently weak AR decoders that solely rely on the information provided by NAR decoder to make prediction, forcing the NAR decoder to become stronger or else it will be unable to support its weak AR partners. Experiments on WMT and IWSLT datasets show that our approach can consistently improve accuracy of multiple NAR baselines without adding any additional decoding overhead.

* Accepted by EMNLP 2022

Via

Access Paper or Ask Questions

The Volctrans GLAT System: Non-autoregressive Translation Meets WMT21

Sep 24, 2021

Lihua Qian, Yi Zhou, Zaixiang Zheng, Yaoming Zhu, Zehui Lin, Jiangtao Feng, Shanbo Cheng, Lei Li, Mingxuan Wang, Hao Zhou

Figure 1 for The Volctrans GLAT System: Non-autoregressive Translation Meets WMT21

Figure 2 for The Volctrans GLAT System: Non-autoregressive Translation Meets WMT21

Figure 3 for The Volctrans GLAT System: Non-autoregressive Translation Meets WMT21

Figure 4 for The Volctrans GLAT System: Non-autoregressive Translation Meets WMT21

Abstract:This paper describes the Volctrans' submission to the WMT21 news translation shared task for German->English translation. We build a parallel (i.e., non-autoregressive) translation system using the Glancing Transformer, which enables fast and accurate parallel decoding in contrast to the currently prevailing autoregressive models. To the best of our knowledge, this is the first parallel translation system that can be scaled to such a practical scenario like WMT competition. More importantly, our parallel translation system achieves the best BLEU score (35.0) on German->English translation task, outperforming all strong autoregressive counterparts.

* 10 pages, 5 figures, WMT2021

Via

Access Paper or Ask Questions

DirectQE: Direct Pretraining for Machine Translation Quality Estimation

May 15, 2021

Qu Cui, Shujian Huang, Jiahuan Li, Xiang Geng, Zaixiang Zheng, Guoping Huang, Jiajun Chen

Figure 1 for DirectQE: Direct Pretraining for Machine Translation Quality Estimation

Figure 2 for DirectQE: Direct Pretraining for Machine Translation Quality Estimation

Figure 3 for DirectQE: Direct Pretraining for Machine Translation Quality Estimation

Figure 4 for DirectQE: Direct Pretraining for Machine Translation Quality Estimation

Abstract:Machine Translation Quality Estimation (QE) is a task of predicting the quality of machine translations without relying on any reference. Recently, the predictor-estimator framework trains the predictor as a feature extractor, which leverages the extra parallel corpora without QE labels, achieving promising QE performance. However, we argue that there are gaps between the predictor and the estimator in both data quality and training objectives, which preclude QE models from benefiting from a large number of parallel corpora more directly. We propose a novel framework called DirectQE that provides a direct pretraining for QE tasks. In DirectQE, a generator is trained to produce pseudo data that is closer to the real QE data, and a detector is pretrained on these data with novel objectives that are akin to the QE task. Experiments on widely used benchmarks show that DirectQE outperforms existing methods, without using any pretraining models such as BERT. We also give extensive analyses showing how fixing the two gaps contributes to our improvements.

Via

Access Paper or Ask Questions

Duplex Sequence-to-Sequence Learning for Reversible Machine Translation

May 07, 2021

Zaixiang Zheng, Hao Zhou, Shujian Huang, Jiajun Chen, Jingjing Xu, Lei Li

Figure 1 for Duplex Sequence-to-Sequence Learning for Reversible Machine Translation

Figure 2 for Duplex Sequence-to-Sequence Learning for Reversible Machine Translation

Figure 3 for Duplex Sequence-to-Sequence Learning for Reversible Machine Translation

Figure 4 for Duplex Sequence-to-Sequence Learning for Reversible Machine Translation

Abstract:Sequence-to-sequence (seq2seq) problems such as machine translation are bidirectional, which naturally derive a pair of directional tasks and two directional learning signals. However, typical seq2seq neural networks are {\em simplex} that only model one unidirectional task, which cannot fully exploit the potential of bidirectional learning signals from parallel data. To address this issue, we propose a {\em duplex} seq2seq neural network, REDER (Reversible Duplex Transformer), and apply it to machine translation. The architecture of REDER has two ends, each of which specializes in a language so as to read and yield sequences in that language. As a result, REDER can simultaneously learn from the bidirectional signals, and enables {\em reversible machine translation} by simply flipping the input and output ends, Experiments on widely-used machine translation benchmarks verify that REDER achieves the first success of reversible machine translation, which helps obtain considerable gains over several strong baselines.

* Under review, 10 pages

Via

Access Paper or Ask Questions

VOLT: Improving Vocabularization via Optimal Transport for Machine Translation

Dec 31, 2020

Jingjing Xu, Hao Zhou, Chun Gan, Zaixiang Zheng, Lei Li

Figure 1 for VOLT: Improving Vocabularization via Optimal Transport for Machine Translation

Figure 2 for VOLT: Improving Vocabularization via Optimal Transport for Machine Translation

Figure 3 for VOLT: Improving Vocabularization via Optimal Transport for Machine Translation

Figure 4 for VOLT: Improving Vocabularization via Optimal Transport for Machine Translation

Abstract:It is well accepted that the choice of token vocabulary largely affects the performance of machine translation. However, due to expensive trial costs, most studies only conduct simple trials with dominant approaches (e.g BPE) and commonly used vocabulary sizes. In this paper, we find an exciting relation between an information-theoretic feature and BLEU scores. With this observation, we formulate the quest of vocabularization -- finding the best token dictionary with a proper size -- as an optimal transport problem. We then propose VOLT, a simple and efficient vocabularization solution without the full and costly trial training. We evaluate our approach on multiple machine translation tasks, including WMT-14 English-German translation, TED bilingual translation, and TED multilingual translation. Empirical results show that VOLT beats widely-used vocabularies on diverse scenarios. For example, VOLT achieves 70% vocabulary size reduction and 0.6 BLEU gain on English-German translation. Also, one advantage of VOLT lies in its low resource consumption. Compared to naive BPE-search, VOLT reduces the search time from 288 GPU hours to 0.5 CPU hours.

Via

Access Paper or Ask Questions

RPD: A Distance Function Between Word Embeddings

May 16, 2020

Xuhui Zhou, Zaixiang Zheng, Shujian Huang

Figure 1 for RPD: A Distance Function Between Word Embeddings

Figure 2 for RPD: A Distance Function Between Word Embeddings

Figure 3 for RPD: A Distance Function Between Word Embeddings

Figure 4 for RPD: A Distance Function Between Word Embeddings

Abstract:It is well-understood that different algorithms, training processes, and corpora produce different word embeddings. However, less is known about the relation between different embedding spaces, i.e. how far different sets of embeddings deviate from each other. In this paper, we propose a novel metric called Relative pairwise inner Product Distance (RPD) to quantify the distance between different sets of word embeddings. This metric has a unified scale for comparing different sets of word embeddings. Based on the properties of RPD, we study the relations of word embeddings of different algorithms systematically and investigate the influence of different training processes and corpora. The results shed light on the poorly understood word embeddings and justify RPD as a measure of the distance of embedding spaces.

* ACL Student Research Workshop 2020

Via

Access Paper or Ask Questions

Toward Making the Most of Context in Neural Machine Translation

Feb 19, 2020

Zaixiang Zheng, Xiang Yue, Shujian Huang, Jiajun Chen, Alexandra Birch

Figure 1 for Toward Making the Most of Context in Neural Machine Translation

Figure 2 for Toward Making the Most of Context in Neural Machine Translation

Figure 3 for Toward Making the Most of Context in Neural Machine Translation

Figure 4 for Toward Making the Most of Context in Neural Machine Translation

Abstract:Document-level machine translation manages to outperform sentence level models by a small margin, but have failed to be widely adopted. We argue that previous research did not make a clear use of the global context, and propose a new document-level NMT framework that deliberately models the local context of each sentence with the awareness of the global context of the document in both source and target languages. We specifically design the model to be able to deal with documents containing any number of sentences, including single sentences. This unified approach allows our model to be trained elegantly on standard datasets without needing to train on sentence and document level data separately. Experimental results demonstrate that our model outperforms Transformer baselines and previous document-level NMT models with substantial margins of up to 2.1 BLEU on state-of-the-art baselines. We also provide analyses which show the benefit of context far beyond the neighboring two or three sentences, which previous studies have typically incorporated.

* Submitted to a conference

Via

Access Paper or Ask Questions

Multi-Perspective Inferrer: Reasoning Sentences Relationship from Holistic Perspective

Nov 09, 2019

Zhen Cheng, Zaixiang Zheng, Xin-Yu Dai, Shujian Huang, Jiajun Chen

Figure 1 for Multi-Perspective Inferrer: Reasoning Sentences Relationship from Holistic Perspective

Figure 2 for Multi-Perspective Inferrer: Reasoning Sentences Relationship from Holistic Perspective

Figure 3 for Multi-Perspective Inferrer: Reasoning Sentences Relationship from Holistic Perspective

Figure 4 for Multi-Perspective Inferrer: Reasoning Sentences Relationship from Holistic Perspective

Abstract:Natural Language Inference (NLI) aims to determine the logic relationships (i.e., entailment, neutral and contradiction) between a pair of premise and hypothesis. Recently, the alignment mechanism effectively helps NLI by capturing the aligned parts (i.e., the similar segments) in the sentence pairs, which imply the perspective of entailment and contradiction. However, these aligned parts will sometimes mislead the judgment of neutral relations. Intuitively, NLI should rely more on multiple perspectives to form a holistic view to eliminate bias. In this paper, we propose the Multi-Perspective Inferrer (MPI), a novel NLI model that reasons relationships from multiple perspectives associated with the three relationships. The MPI determines the perspectives of different parts of the sentences via a routing-by-agreement policy and makes the final decision from a holistic view. Additionally, we introduce an auxiliary supervised signal to ensure the MPI to learn the expected perspectives. Experiments on SNLI and MultiNLI show that 1) the MPI achieves substantial improvements on the base model, which verifies the motivation of multi-perspective inference; 2) visualized evidence verifies that the MPI learns highly interpretable perspectives as expected; 3) more importantly, the MPI is architecture-free and compatible with the powerful BERT.

* In progress

Via

Access Paper or Ask Questions

Learning Representation Mapping for Relation Detection in Knowledge Base Question Answering

Jul 17, 2019

Peng Wu, Shujian Huang, Rongxiang Weng, Zaixiang Zheng, Jianbing Zhang, Xiaohui Yan, Jiajun Chen

Figure 1 for Learning Representation Mapping for Relation Detection in Knowledge Base Question Answering

Figure 2 for Learning Representation Mapping for Relation Detection in Knowledge Base Question Answering

Figure 3 for Learning Representation Mapping for Relation Detection in Knowledge Base Question Answering

Figure 4 for Learning Representation Mapping for Relation Detection in Knowledge Base Question Answering

Abstract:Relation detection is a core step in many natural language process applications including knowledge base question answering. Previous efforts show that single-fact questions could be answered with high accuracy. However, one critical problem is that current approaches only get high accuracy for questions whose relations have been seen in the training data. But for unseen relations, the performance will drop rapidly. The main reason for this problem is that the representations for unseen relations are missing. In this paper, we propose a simple mapping method, named representation adapter, to learn the representation mapping for both seen and unseen relations based on previously learned relation embedding. We employ the adversarial objective and the reconstruction objective to improve the mapping performance. We re-organize the popular SimpleQuestion dataset to reveal and evaluate the problem of detecting unseen relations. Experiments show that our method can greatly improve the performance of unseen relations while the performance for those seen part is kept comparable to the state-of-the-art. Our code and data are available at https://github.com/wudapeng268/KBQA-Adapter.

* 10 pages, 5 figures, accepted by ACL 2019

Via

Access Paper or Ask Questions