Alert button
Picture for Yidong Chen

Yidong Chen

Alert button

Layer-wise Representation Fusion for Compositional Generalization

Jul 20, 2023
Yafang Zheng, Lei Lin, Zhaohong Lai, Binling Wang, Shan Liu, Biao Fu, Wenhao Rao, Peigen Ye, Yidong Chen, Xiaodong Shi

Despite successes across a broad range of applications, sequence-to-sequence models' construct of solutions are argued to be less compositional than human-like generalization. There is mounting evidence that one of the reasons hindering compositional generalization is representations of the encoder and decoder uppermost layer are entangled. In other words, the syntactic and semantic representations of sequences are twisted inappropriately. However, most previous studies mainly concentrate on enhancing token-level semantic information to alleviate the representations entanglement problem, rather than composing and using the syntactic and semantic representations of sequences appropriately as humans do. In addition, we explain why the entanglement problem exists from the perspective of recent studies about training deeper Transformer, mainly owing to the ``shallow'' residual connections and its simple, one-step operations, which fails to fuse previous layers' information effectively. Starting from this finding and inspired by humans' strategies, we propose \textsc{FuSion} (\textbf{Fu}sing \textbf{S}yntactic and Semant\textbf{i}c Representati\textbf{on}s), an extension to sequence-to-sequence models to learn to fuse previous layers' information back into the encoding and decoding process appropriately through introducing a \emph{fuse-attention module} at each encoder and decoder layer. \textsc{FuSion} achieves competitive and even \textbf{state-of-the-art} results on two realistic benchmarks, which empirically demonstrates the effectiveness of our proposal.

* work in progress. arXiv admin note: substantial text overlap with arXiv:2305.12169 
Viaarxiv icon

Learn to Compose Syntactic and Semantic Representations Appropriately for Compositional Generalization

May 20, 2023
Lei Lin, Shuangtao Li, Biao Fu, Yafang Zheng, Shan Liu, Yidong Chen, Xiaodong Shi

Figure 1 for Learn to Compose Syntactic and Semantic Representations Appropriately for Compositional Generalization
Figure 2 for Learn to Compose Syntactic and Semantic Representations Appropriately for Compositional Generalization
Figure 3 for Learn to Compose Syntactic and Semantic Representations Appropriately for Compositional Generalization
Figure 4 for Learn to Compose Syntactic and Semantic Representations Appropriately for Compositional Generalization

Recent studies have shown that sequence-to-sequence (Seq2Seq) models are limited in solving the compositional generalization (CG) tasks, failing to systematically generalize to unseen compositions of seen components. There is mounting evidence that one of the reasons hindering CG is the representation of the encoder uppermost layer is entangled. In other words, the syntactic and semantic representations of sequences are twisted inappropriately. However, most previous studies mainly concentrate on enhancing semantic information at token-level, rather than composing the syntactic and semantic representations of sequences appropriately as humans do. In addition, we consider the representation entanglement problem they found is not comprehensive, and further hypothesize that source keys and values representations passing into different decoder layers are also entangled. Staring from this intuition and inspired by humans' strategies for CG, we propose COMPSITION (Compose Syntactic and Semantic Representations), an extension to Seq2Seq models to learn to compose representations of different encoder layers appropriately for generating different keys and values passing into different decoder layers through introducing a composed layer between the encoder and decoder. COMPSITION achieves competitive and even state-of-the-art results on two realistic benchmarks, which empirically demonstrates the effectiveness of our proposal.

* Work in progress 
Viaarxiv icon

CVT-SLR: Contrastive Visual-Textual Transformation for Sign Language Recognition with Variational Alignment

Mar 23, 2023
Jiangbin Zheng, Yile Wang, Cheng Tan, Siyuan Li, Ge Wang, Jun Xia, Yidong Chen, Stan Z. Li

Figure 1 for CVT-SLR: Contrastive Visual-Textual Transformation for Sign Language Recognition with Variational Alignment
Figure 2 for CVT-SLR: Contrastive Visual-Textual Transformation for Sign Language Recognition with Variational Alignment
Figure 3 for CVT-SLR: Contrastive Visual-Textual Transformation for Sign Language Recognition with Variational Alignment
Figure 4 for CVT-SLR: Contrastive Visual-Textual Transformation for Sign Language Recognition with Variational Alignment

Sign language recognition (SLR) is a weakly supervised task that annotates sign videos as textual glosses. Recent studies show that insufficient training caused by the lack of large-scale available sign datasets becomes the main bottleneck for SLR. Most SLR works thereby adopt pretrained visual modules and develop two mainstream solutions. The multi-stream architectures extend multi-cue visual features, yielding the current SOTA performances but requiring complex designs and might introduce potential noise. Alternatively, the advanced single-cue SLR frameworks using explicit cross-modal alignment between visual and textual modalities are simple and effective, potentially competitive with the multi-cue framework. In this work, we propose a novel contrastive visual-textual transformation for SLR, CVT-SLR, to fully explore the pretrained knowledge of both the visual and language modalities. Based on the single-cue cross-modal alignment framework, we propose a variational autoencoder (VAE) for pretrained contextual knowledge while introducing the complete pretrained language module. The VAE implicitly aligns visual and textual modalities while benefiting from pretrained contextual knowledge as the traditional contextual module. Meanwhile, a contrastive cross-modal alignment algorithm is designed to explicitly enhance the consistency constraints. Extensive experiments on public datasets (PHOENIX-2014 and PHOENIX-2014T) demonstrate that our proposed CVT-SLR consistently outperforms existing single-cue methods and even outperforms SOTA multi-cue methods.

* Accepted to CVPR 2023 (Highlight Paper Top 2.5%) 
Viaarxiv icon

Adapting Offline Speech Translation Models for Streaming with Future-Aware Distillation and Inference

Mar 14, 2023
Biao Fu, Kai Fan, Minpeng Liao, Zhongqiang Huang, Boxing Chen, Yidong Chen, Xiaodong Shi

Figure 1 for Adapting Offline Speech Translation Models for Streaming with Future-Aware Distillation and Inference
Figure 2 for Adapting Offline Speech Translation Models for Streaming with Future-Aware Distillation and Inference
Figure 3 for Adapting Offline Speech Translation Models for Streaming with Future-Aware Distillation and Inference
Figure 4 for Adapting Offline Speech Translation Models for Streaming with Future-Aware Distillation and Inference

A popular approach to streaming speech translation is to employ a single offline model with a \textit{wait-$k$} policy to support different latency requirements, which is simpler than training multiple online models with different latency constraints. However, there is a mismatch problem in using a model trained with complete utterances for streaming inference with partial input. We demonstrate that speech representations extracted at the end of a streaming input are significantly different from those extracted from a complete utterance. To address this issue, we propose a new approach called Future-Aware Streaming Translation (FAST) that adapts an offline ST model for streaming input. FAST includes a Future-Aware Inference (FAI) strategy that incorporates future context through a trainable masked embedding, and a Future-Aware Distillation (FAD) framework that transfers future context from an approximation of full speech to streaming input. Our experiments on the MuST-C EnDe, EnEs, and EnFr benchmarks show that FAST achieves better trade-offs between translation quality and latency than strong baselines. Extensive analyses suggest that our methods effectively alleviate the aforementioned mismatch problem between offline training and online inference.

* work in progress 
Viaarxiv icon

MOPRD: A multidisciplinary open peer review dataset

Dec 09, 2022
Jialiang Lin, Jiaxin Song, Zhangping Zhou, Yidong Chen, Xiaodong Shi

Figure 1 for MOPRD: A multidisciplinary open peer review dataset
Figure 2 for MOPRD: A multidisciplinary open peer review dataset
Figure 3 for MOPRD: A multidisciplinary open peer review dataset
Figure 4 for MOPRD: A multidisciplinary open peer review dataset

Open peer review is a growing trend in academic publications. Public access to peer review data can benefit both the academic and publishing communities. It also serves as a great support to studies on review comment generation and further to the realization of automated scholarly paper review. However, most of the existing peer review datasets do not provide data that cover the whole peer review process. Apart from this, their data are not diversified enough as they are mainly collected from the field of computer science. These two drawbacks of the currently available peer review datasets need to be addressed to unlock more opportunities for related studies. In response to this problem, we construct MOPRD, a multidisciplinary open peer review dataset. This dataset consists of paper metadata, multiple version manuscripts, review comments, meta-reviews, author's rebuttal letters, and editorial decisions. Moreover, we design a modular guided review comment generation method based on MOPRD. Experiments show that our method delivers better performance indicated by both automatic metrics and human evaluation. We also explore other potential applications of MOPRD, including meta-review generation, editorial decision prediction, author rebuttal generation, and scientometric analysis. MOPRD is a strong endorsement for further studies in peer review-related research and other applications.

Viaarxiv icon

Towards Better Document-level Relation Extraction via Iterative Inference

Nov 26, 2022
Liang Zhang, Jinsong Su, Yidong Chen, Zhongjian Miao, Zijun Min, Qingguo Hu, Xiaodong Shi

Figure 1 for Towards Better Document-level Relation Extraction via Iterative Inference
Figure 2 for Towards Better Document-level Relation Extraction via Iterative Inference
Figure 3 for Towards Better Document-level Relation Extraction via Iterative Inference
Figure 4 for Towards Better Document-level Relation Extraction via Iterative Inference

Document-level relation extraction (RE) aims to extract the relations between entities from the input document that usually containing many difficultly-predicted entity pairs whose relations can only be predicted through relational inference. Existing methods usually directly predict the relations of all entity pairs of input document in a one-pass manner, ignoring the fact that predictions of some entity pairs heavily depend on the predicted results of other pairs. To deal with this issue, in this paper, we propose a novel document-level RE model with iterative inference. Our model is mainly composed of two modules: 1) a base module expected to provide preliminary relation predictions on entity pairs; 2) an inference module introduced to refine these preliminary predictions by iteratively dealing with difficultly-predicted entity pairs depending on other pairs in an easy-to-hard manner. Unlike previous methods which only consider feature information of entity pairs, our inference module is equipped with two Extended Cross Attention units, allowing it to exploit both feature information and previous predictions of entity pairs during relational inference. Furthermore, we adopt a two-stage strategy to train our model. At the first stage, we only train our base module. During the second stage, we train the whole model, where contrastive learning is introduced to enhance the training of inference module. Experimental results on three commonly-used datasets show that our model consistently outperforms other competitive baselines.

* Accepted by EMNLP 2022 (long paper) 
Viaarxiv icon

Leveraging Graph-based Cross-modal Information Fusion for Neural Sign Language Translation

Nov 01, 2022
Jiangbin Zheng, Siyuan Li, Cheng Tan, Chong Wu, Yidong Chen, Stan Z. Li

Figure 1 for Leveraging Graph-based Cross-modal Information Fusion for Neural Sign Language Translation
Figure 2 for Leveraging Graph-based Cross-modal Information Fusion for Neural Sign Language Translation
Figure 3 for Leveraging Graph-based Cross-modal Information Fusion for Neural Sign Language Translation
Figure 4 for Leveraging Graph-based Cross-modal Information Fusion for Neural Sign Language Translation

Sign Language (SL), as the mother tongue of the deaf community, is a special visual language that most hearing people cannot understand. In recent years, neural Sign Language Translation (SLT), as a possible way for bridging communication gap between the deaf and the hearing people, has attracted widespread academic attention. We found that the current mainstream end-to-end neural SLT models, which tries to learning language knowledge in a weakly supervised manner, could not mine enough semantic information under the condition of low data resources. Therefore, we propose to introduce additional word-level semantic knowledge of sign language linguistics to assist in improving current end-to-end neural SLT models. Concretely, we propose a novel neural SLT model with multi-modal feature fusion based on the dynamic graph, in which the cross-modal information, i.e. text and video, is first assembled as a dynamic graph according to their correlation, and then the graph is processed by a multi-modal graph encoder to generate the multi-modal embeddings for further usage in the subsequent neural translation models. To the best of our knowledge, we are the first to introduce graph neural networks, for fusing multi-modal information, into neural sign language translation models. Moreover, we conducted experiments on a publicly available popular SLT dataset RWTH-PHOENIX-Weather-2014T. and the quantitative experiments show that our method can improve the model.

Viaarxiv icon

Automatic Analysis of Available Source Code of Top Artificial Intelligence Conference Papers

Sep 28, 2022
Jialiang Lin, Yingmin Wang, Yao Yu, Yu Zhou, Yidong Chen, Xiaodong Shi

Figure 1 for Automatic Analysis of Available Source Code of Top Artificial Intelligence Conference Papers
Figure 2 for Automatic Analysis of Available Source Code of Top Artificial Intelligence Conference Papers
Figure 3 for Automatic Analysis of Available Source Code of Top Artificial Intelligence Conference Papers
Figure 4 for Automatic Analysis of Available Source Code of Top Artificial Intelligence Conference Papers

Source code is essential for researchers to reproduce the methods and replicate the results of artificial intelligence (AI) papers. Some organizations and researchers manually collect AI papers with available source code to contribute to the AI community. However, manual collection is a labor-intensive and time-consuming task. To address this issue, we propose a method to automatically identify papers with available source code and extract their source code repository URLs. With this method, we find that 20.5% of regular papers of 10 top AI conferences published from 2010 to 2019 are identified as papers with available source code and that 8.1% of these source code repositories are no longer accessible. We also create the XMU NLP Lab README Dataset, the largest dataset of labeled README files for source code document research. Through this dataset, we have discovered that quite a few README files have no installation instructions or usage tutorials provided. Further, a large-scale comprehensive statistical analysis is made for a general picture of the source code of AI conference papers. The proposed solution can also go beyond AI conference papers to analyze other scientific papers from both journals and conferences to shed light on more domains.

* International Journal of Software Engineering and Knowledge Engineering, Vol. 32, No. 07, pp. 947-970 (2022)  
* Please cite the version of IJSEKE 
Viaarxiv icon