Alert button
Picture for Aoying Zhou

Aoying Zhou

Alert button

TransPrompt v2: A Transferable Prompting Framework for Cross-task Text Classification

Aug 29, 2023
Jianing Wang, Chengyu Wang, Cen Chen, Ming Gao, Jun Huang, Aoying Zhou

Figure 1 for TransPrompt v2: A Transferable Prompting Framework for Cross-task Text Classification
Figure 2 for TransPrompt v2: A Transferable Prompting Framework for Cross-task Text Classification
Figure 3 for TransPrompt v2: A Transferable Prompting Framework for Cross-task Text Classification
Figure 4 for TransPrompt v2: A Transferable Prompting Framework for Cross-task Text Classification

Text classification is one of the most imperative tasks in natural language processing (NLP). Recent advances with pre-trained language models (PLMs) have shown remarkable success on this task. However, the satisfying results obtained by PLMs heavily depend on the large amounts of task-specific labeled data, which may not be feasible in many application scenarios due to data access and privacy constraints. The recently-proposed prompt-based fine-tuning paradigm improves the performance of PLMs for few-shot text classification with task-specific templates. Yet, it is unclear how the prompting knowledge can be transferred across tasks, for the purpose of mutual reinforcement. We propose TransPrompt v2, a novel transferable prompting framework for few-shot learning across similar or distant text classification tasks. For learning across similar tasks, we employ a multi-task meta-knowledge acquisition (MMA) procedure to train a meta-learner that captures the cross-task transferable knowledge. For learning across distant tasks, we further inject the task type descriptions into the prompt, and capture the intra-type and inter-type prompt embeddings among multiple distant tasks. Additionally, two de-biasing techniques are further designed to make the trained meta-learner more task-agnostic and unbiased towards any tasks. After that, the meta-learner can be adapted to each specific task with better parameters initialization. Extensive experiments show that TransPrompt v2 outperforms single-task and cross-task strong baselines over multiple NLP tasks and datasets. We further show that the meta-learner can effectively improve the performance of PLMs on previously unseen tasks. In addition, TransPrompt v2 also outperforms strong fine-tuning baselines when learning with full training sets.

Viaarxiv icon

Uncertainty-aware Self-training for Low-resource Neural Sequence Labeling

Feb 17, 2023
Jianing Wang, Chengyu Wang, Jun Huang, Ming Gao, Aoying Zhou

Figure 1 for Uncertainty-aware Self-training for Low-resource Neural Sequence Labeling
Figure 2 for Uncertainty-aware Self-training for Low-resource Neural Sequence Labeling
Figure 3 for Uncertainty-aware Self-training for Low-resource Neural Sequence Labeling
Figure 4 for Uncertainty-aware Self-training for Low-resource Neural Sequence Labeling

Neural sequence labeling (NSL) aims at assigning labels for input language tokens, which covers a broad range of applications, such as named entity recognition (NER) and slot filling, etc. However, the satisfying results achieved by traditional supervised-based approaches heavily depend on the large amounts of human annotation data, which may not be feasible in real-world scenarios due to data privacy and computation efficiency issues. This paper presents SeqUST, a novel uncertain-aware self-training framework for NSL to address the labeled data scarcity issue and to effectively utilize unlabeled data. Specifically, we incorporate Monte Carlo (MC) dropout in Bayesian neural network (BNN) to perform uncertainty estimation at the token level and then select reliable language tokens from unlabeled data based on the model confidence and certainty. A well-designed masked sequence labeling task with a noise-robust loss supports robust training, which aims to suppress the problem of noisy pseudo labels. In addition, we develop a Gaussian-based consistency regularization technique to further improve the model robustness on Gaussian-distributed perturbed representations. This effectively alleviates the over-fitting dilemma originating from pseudo-labeled augmented data. Extensive experiments over six benchmarks demonstrate that our SeqUST framework effectively improves the performance of self-training, and consistently outperforms strong baselines by a large margin in low-resource scenarios

* AAAI 2023  
* 11 pages, 3 figures 
Viaarxiv icon

Meta-Learning Siamese Network for Few-Shot Text Classification

Feb 05, 2023
Chengcheng Han, Yuhe Wang, Yingnan Fu, Xiang Li, Minghui Qiu, Ming Gao, Aoying Zhou

Figure 1 for Meta-Learning Siamese Network for Few-Shot Text Classification
Figure 2 for Meta-Learning Siamese Network for Few-Shot Text Classification
Figure 3 for Meta-Learning Siamese Network for Few-Shot Text Classification
Figure 4 for Meta-Learning Siamese Network for Few-Shot Text Classification

Few-shot learning has been used to tackle the problem of label scarcity in text classification, of which meta-learning based methods have shown to be effective, such as the prototypical networks (PROTO). Despite the success of PROTO, there still exist three main problems: (1) ignore the randomness of the sampled support sets when computing prototype vectors; (2) disregard the importance of labeled samples; (3) construct meta-tasks in a purely random manner. In this paper, we propose a Meta-Learning Siamese Network, namely, Meta-SN, to address these issues. Specifically, instead of computing prototype vectors from the sampled support sets, Meta-SN utilizes external knowledge (e.g. class names and descriptive texts) for class labels, which is encoded as the low-dimensional embeddings of prototype vectors. In addition, Meta-SN presents a novel sampling strategy for constructing meta-tasks, which gives higher sampling probabilities to hard-to-classify samples. Extensive experiments are conducted on six benchmark datasets to show the clear superiority of Meta-SN over other state-of-the-art models. For reproducibility, all the datasets and codes are provided at https://github.com/hccngu/Meta-SN.

Viaarxiv icon

Understanding Long Programming Languages with Structure-Aware Sparse Attention

May 27, 2022
Tingting Liu, Chengyu Wang, Cen Chen, Ming Gao, Aoying Zhou

Figure 1 for Understanding Long Programming Languages with Structure-Aware Sparse Attention
Figure 2 for Understanding Long Programming Languages with Structure-Aware Sparse Attention
Figure 3 for Understanding Long Programming Languages with Structure-Aware Sparse Attention
Figure 4 for Understanding Long Programming Languages with Structure-Aware Sparse Attention

Programming-based Pre-trained Language Models (PPLMs) such as CodeBERT have achieved great success in many downstream code-related tasks. Since the memory and computational complexity of self-attention in the Transformer grow quadratically with the sequence length, PPLMs typically limit the code length to 512. However, codes in real-world applications are generally long, such as code searches, which cannot be processed efficiently by existing PPLMs. To solve this problem, in this paper, we present SASA, a Structure-Aware Sparse Attention mechanism, which reduces the complexity and improves performance for long code understanding tasks. The key components in SASA are top-$k$ sparse attention and Abstract Syntax Tree (AST)-based structure-aware attention. With top-$k$ sparse attention, the most crucial attention relation can be obtained with a lower computational cost. As the code structure represents the logic of the code statements, which is a complement to the code sequence characteristics, we further introduce AST structures into attention. Extensive experiments on CodeXGLUE tasks show that SASA achieves better performance than the competing baselines.

* sigir 2022 accepted, code will be available at https://github.com/alibaba/EasyNLP 
Viaarxiv icon

GypSum: Learning Hybrid Representations for Code Summarization

Apr 26, 2022
Yu Wang, Yu Dong, Xuesong Lu, Aoying Zhou

Figure 1 for GypSum: Learning Hybrid Representations for Code Summarization
Figure 2 for GypSum: Learning Hybrid Representations for Code Summarization
Figure 3 for GypSum: Learning Hybrid Representations for Code Summarization
Figure 4 for GypSum: Learning Hybrid Representations for Code Summarization

Code summarization with deep learning has been widely studied in recent years. Current deep learning models for code summarization generally follow the principle in neural machine translation and adopt the encoder-decoder framework, where the encoder learns the semantic representations from source code and the decoder transforms the learnt representations into human-readable text that describes the functionality of code snippets. Despite they achieve the new state-of-the-art performance, we notice that current models often either generate less fluent summaries, or fail to capture the core functionality, since they usually focus on a single type of code representations. As such we propose GypSum, a new deep learning model that learns hybrid representations using graph attention neural networks and a pre-trained programming and natural language model. We introduce particular edges related to the control flow of a code snippet into the abstract syntax tree for graph construction, and design two encoders to learn from the graph and the token sequence of source code, respectively. We modify the encoder-decoder sublayer in the Transformer's decoder to fuse the representations and propose a dual-copy mechanism to facilitate summary generation. Experimental results demonstrate the superior performance of GypSum over existing code summarization models.

* 12 pages, 6 figures, 6 tables 
Viaarxiv icon

Programming Knowledge Tracing: A Comprehensive Dataset and A New Model

Dec 11, 2021
Renyu Zhu, Dongxiang Zhang, Chengcheng Han, Ming Gao, Xuesong Lu, Weining Qian, Aoying Zhou

Figure 1 for Programming Knowledge Tracing: A Comprehensive Dataset and A New Model
Figure 2 for Programming Knowledge Tracing: A Comprehensive Dataset and A New Model
Figure 3 for Programming Knowledge Tracing: A Comprehensive Dataset and A New Model
Figure 4 for Programming Knowledge Tracing: A Comprehensive Dataset and A New Model

In this paper, we study knowledge tracing in the domain of programming education and make two important contributions. First, we harvest and publish so far the most comprehensive dataset, namely BePKT, which covers various online behaviors in an OJ system, including programming text problems, knowledge annotations, user-submitted code and system-logged events. Second, we propose a new model PDKT to exploit the enriched context for accurate student behavior prediction. More specifically, we construct a bipartite graph for programming problem embedding, and design an improved pre-training model PLCodeBERT for code embedding, as well as a double-sequence RNN model with exponential decay attention for effective feature fusion. Experimental results on the new dataset BePKT show that our proposed model establishes state-of-the-art performance in programming knowledge tracing. In addition, we verify that our code embedding strategy based on PLCodeBERT is complementary to existing knowledge tracing models to further enhance their accuracy. As a side product, PLCodeBERT also results in better performance in other programming-related tasks such as code clone detection.

Viaarxiv icon

Are Missing Links Predictable? An Inferential Benchmark for Knowledge Graph Completion

Aug 03, 2021
Yixin Cao, Kuang Jun, Ming Gao, Aoying Zhou, Yonggang Wen, Tat-Seng Chua

Figure 1 for Are Missing Links Predictable? An Inferential Benchmark for Knowledge Graph Completion
Figure 2 for Are Missing Links Predictable? An Inferential Benchmark for Knowledge Graph Completion
Figure 3 for Are Missing Links Predictable? An Inferential Benchmark for Knowledge Graph Completion
Figure 4 for Are Missing Links Predictable? An Inferential Benchmark for Knowledge Graph Completion

We present InferWiki, a Knowledge Graph Completion (KGC) dataset that improves upon existing benchmarks in inferential ability, assumptions, and patterns. First, each testing sample is predictable with supportive data in the training set. To ensure it, we propose to utilize rule-guided train/test generation, instead of conventional random split. Second, InferWiki initiates the evaluation following the open-world assumption and improves the inferential difficulty of the closed-world assumption, by providing manually annotated negative and unknown triples. Third, we include various inference patterns (e.g., reasoning path length and types) for comprehensive evaluation. In experiments, we curate two settings of InferWiki varying in sizes and structures, and apply the construction process on CoDEx as comparative datasets. The results and empirical analyses demonstrate the necessity and high-quality of InferWiki. Nevertheless, the performance gap among various inferential assumptions and patterns presents the difficulty and inspires future research direction. Our datasets can be found in https://github.com/TaoMiner/inferwiki

* 15 pages, 13 figures, ACL'2021 
Viaarxiv icon

Meta-Learning Adversarial Domain Adaptation Network for Few-Shot Text Classification

Jul 26, 2021
ChengCheng Han, Zeqiu Fan, Dongxiang Zhang, Minghui Qiu, Ming Gao, Aoying Zhou

Figure 1 for Meta-Learning Adversarial Domain Adaptation Network for Few-Shot Text Classification
Figure 2 for Meta-Learning Adversarial Domain Adaptation Network for Few-Shot Text Classification
Figure 3 for Meta-Learning Adversarial Domain Adaptation Network for Few-Shot Text Classification
Figure 4 for Meta-Learning Adversarial Domain Adaptation Network for Few-Shot Text Classification

Meta-learning has emerged as a trending technique to tackle few-shot text classification and achieved state-of-the-art performance. However, existing solutions heavily rely on the exploitation of lexical features and their distributional signatures on training data, while neglecting to strengthen the model's ability to adapt to new tasks. In this paper, we propose a novel meta-learning framework integrated with an adversarial domain adaptation network, aiming to improve the adaptive ability of the model and generate high-quality text embedding for new classes. Extensive experiments are conducted on four benchmark datasets and our method demonstrates clear superiority over the state-of-the-art models in all the datasets. In particular, the accuracy of 1-shot and 5-shot classification on the dataset of 20 Newsgroups is boosted from 52.1% to 59.6%, and from 68.3% to 77.8%, respectively.

Viaarxiv icon

Learning Relation Prototype from Unlabeled Texts for Long-tail Relation Extraction

Nov 27, 2020
Yixin Cao, Jun Kuang, Ming Gao, Aoying Zhou, Yonggang Wen, Tat-Seng Chua

Figure 1 for Learning Relation Prototype from Unlabeled Texts for Long-tail Relation Extraction
Figure 2 for Learning Relation Prototype from Unlabeled Texts for Long-tail Relation Extraction
Figure 3 for Learning Relation Prototype from Unlabeled Texts for Long-tail Relation Extraction
Figure 4 for Learning Relation Prototype from Unlabeled Texts for Long-tail Relation Extraction

Relation Extraction (RE) is a vital step to complete Knowledge Graph (KG) by extracting entity relations from texts.However, it usually suffers from the long-tail issue. The training data mainly concentrates on a few types of relations, leading to the lackof sufficient annotations for the remaining types of relations. In this paper, we propose a general approach to learn relation prototypesfrom unlabeled texts, to facilitate the long-tail relation extraction by transferring knowledge from the relation types with sufficient trainingdata. We learn relation prototypes as an implicit factor between entities, which reflects the meanings of relations as well as theirproximities for transfer learning. Specifically, we construct a co-occurrence graph from texts, and capture both first-order andsecond-order entity proximities for embedding learning. Based on this, we further optimize the distance from entity pairs tocorresponding prototypes, which can be easily adapted to almost arbitrary RE frameworks. Thus, the learning of infrequent or evenunseen relation types will benefit from semantically proximate relations through pairs of entities and large-scale textual information.We have conducted extensive experiments on two publicly available datasets: New York Times and Google Distant Supervision.Compared with eight state-of-the-art baselines, our proposed model achieves significant improvements (4.1% F1 on average). Furtherresults on long-tail relations demonstrate the effectiveness of the learned relation prototypes. We further conduct an ablation study toinvestigate the impacts of varying components, and apply it to four basic relation extraction models to verify the generalization ability.Finally, we analyze several example cases to give intuitive impressions as qualitative analysis. Our codes will be released later.

Viaarxiv icon

EDSL: An Encoder-Decoder Architecture with Symbol-Level Features for Printed Mathematical Expression Recognition

Jul 06, 2020
Yingnan Fu, Tingting Liu, Ming Gao, Aoying Zhou

Figure 1 for EDSL: An Encoder-Decoder Architecture with Symbol-Level Features for Printed Mathematical Expression Recognition
Figure 2 for EDSL: An Encoder-Decoder Architecture with Symbol-Level Features for Printed Mathematical Expression Recognition
Figure 3 for EDSL: An Encoder-Decoder Architecture with Symbol-Level Features for Printed Mathematical Expression Recognition
Figure 4 for EDSL: An Encoder-Decoder Architecture with Symbol-Level Features for Printed Mathematical Expression Recognition

Printed Mathematical expression recognition (PMER) aims to transcribe a printed mathematical expression image into a structural expression, such as LaTeX expression. It is a crucial task for many applications, including automatic question recommendation, automatic problem solving and analysis of the students, etc. Currently, the mainstream solutions rely on solving image captioning tasks, all addressing image summarization. As such, these methods can be suboptimal for solving MER problem. In this paper, we propose a new method named EDSL, shorted for encoder-decoder with symbol-level features, to identify the printed mathematical expressions from images. The symbol-level image encoder of EDSL consists of segmentation module and reconstruction module. By performing segmentation module, we identify all the symbols and their spatial information from images in an unsupervised manner. We then design a novel reconstruction module to recover the symbol dependencies after symbol segmentation. Especially, we employ a position correction attention mechanism to capture the spatial relationships between symbols. To alleviate the negative impact from long output, we apply the transformer model for transcribing the encoded image into the sequential and structural output. We conduct extensive experiments on two real datasets to verify the effectiveness and rationality of our proposed EDSL method. The experimental results have illustrated that EDSL has achieved 92.7\% and 89.0\% in evaluation metric Match, which are 3.47\% and 4.04\% higher than the state-of-the-art method. Our code and datasets are available at https://github.com/abcAnonymous/EDSL .

Viaarxiv icon