Get our free extension to see links to code for papers anywhere online!

Chrome logo Add to Chrome

Firefox logo Add to Firefox

"Text": models, code, and papers

ALLWAS: Active Learning on Language models in WASserstein space

Sep 03, 2021
Anson Bastos, Manohar Kaul

Active learning has emerged as a standard paradigm in areas with scarcity of labeled training data, such as in the medical domain. Language models have emerged as the prevalent choice of several natural language tasks due to the performance boost offered by these models. However, in several domains, such as medicine, the scarcity of labeled training data is a common issue. Also, these models may not work well in cases where class imbalance is prevalent. Active learning may prove helpful in these cases to boost the performance with a limited label budget. To this end, we propose a novel method using sampling techniques based on submodular optimization and optimal transport for active learning in language models, dubbed ALLWAS. We construct a sampling strategy based on submodular optimization of the designed objective in the gradient domain. Furthermore, to enable learning from few samples, we propose a novel strategy for sampling from the Wasserstein barycenters. Our empirical evaluations on standard benchmark datasets for text classification show that our methods perform significantly better (>20% relative increase in some cases) than existing approaches for active learning on language models.


  Access Paper or Ask Questions

The DCU-EPFL Enhanced Dependency Parser at the IWPT 2021 Shared Task

Jul 05, 2021
James Barry, Alireza Mohammadshahi, Joachim Wagner, Jennifer Foster, James Henderson

We describe the DCU-EPFL submission to the IWPT 2021 Shared Task on Parsing into Enhanced Universal Dependencies. The task involves parsing Enhanced UD graphs, which are an extension of the basic dependency trees designed to be more facilitative towards representing semantic structure. Evaluation is carried out on 29 treebanks in 17 languages and participants are required to parse the data from each language starting from raw strings. Our approach uses the Stanza pipeline to preprocess the text files, XLMRoBERTa to obtain contextualized token representations, and an edge-scoring and labeling model to predict the enhanced graph. Finally, we run a post-processing script to ensure all of our outputs are valid Enhanced UD graphs. Our system places 6th out of 9 participants with a coarse Enhanced Labeled Attachment Score (ELAS) of 83.57. We carry out additional post-deadline experiments which include using Trankit for pre-processing, XLM-RoBERTa-LARGE, treebank concatenation, and multitask learning between a basic and an enhanced dependency parser. All of these modifications improve our initial score and our final system has a coarse ELAS of 88.04.

* Submitted to the IWPT 2021 Shared Task: From Raw Text to Enhanced Universal Dependencies: the Parsing Shared Task at IWPT 2021 

  Access Paper or Ask Questions

Graph Neural Networks for Natural Language Processing: A Survey

Jun 10, 2021
Lingfei Wu, Yu Chen, Kai Shen, Xiaojie Guo, Hanning Gao, Shucheng Li, Jian Pei, Bo Long

Deep learning has become the dominant approach in coping with various tasks in Natural LanguageProcessing (NLP). Although text inputs are typically represented as a sequence of tokens, there isa rich variety of NLP problems that can be best expressed with a graph structure. As a result, thereis a surge of interests in developing new deep learning techniques on graphs for a large numberof NLP tasks. In this survey, we present a comprehensive overview onGraph Neural Networks(GNNs) for Natural Language Processing. We propose a new taxonomy of GNNs for NLP, whichsystematically organizes existing research of GNNs for NLP along three axes: graph construction,graph representation learning, and graph based encoder-decoder models. We further introducea large number of NLP applications that are exploiting the power of GNNs and summarize thecorresponding benchmark datasets, evaluation metrics, and open-source codes. Finally, we discussvarious outstanding challenges for making the full use of GNNs for NLP as well as future researchdirections. To the best of our knowledge, this is the first comprehensive overview of Graph NeuralNetworks for Natural Language Processing.

* 127 pages 

  Access Paper or Ask Questions

GWLAN: General Word-Level AutocompletioN for Computer-Aided Translation

May 31, 2021
Huayang Li, Lemao Liu, Guoping Huang, Shuming Shi

Computer-aided translation (CAT), the use of software to assist a human translator in the translation process, has been proven to be useful in enhancing the productivity of human translators. Autocompletion, which suggests translation results according to the text pieces provided by human translators, is a core function of CAT. There are two limitations in previous research in this line. First, most research works on this topic focus on sentence-level autocompletion (i.e., generating the whole translation as a sentence based on human input), but word-level autocompletion is under-explored so far. Second, almost no public benchmarks are available for the autocompletion task of CAT. This might be among the reasons why research progress in CAT is much slower compared to automatic MT. In this paper, we propose the task of general word-level autocompletion (GWLAN) from a real-world CAT scenario, and construct the first public benchmark to facilitate research in this topic. In addition, we propose an effective method for GWLAN and compare it with several strong baselines. Experiments demonstrate that our proposed method can give significantly more accurate predictions than the baseline methods on our benchmark datasets.

* Accepted into the main conference of ACL 2021. arXiv admin note: text overlap with arXiv:2105.13072 

  Access Paper or Ask Questions

Graph Learning: A Survey

May 03, 2021
Feng Xia, Ke Sun, Shuo Yu, Abdul Aziz, Liangtian Wan, Shirui Pan, Huan Liu

Graphs are widely used as a popular representation of the network structure of connected data. Graph data can be found in a broad spectrum of application domains such as social systems, ecosystems, biological networks, knowledge graphs, and information systems. With the continuous penetration of artificial intelligence technologies, graph learning (i.e., machine learning on graphs) is gaining attention from both researchers and practitioners. Graph learning proves effective for many tasks, such as classification, link prediction, and matching. Generally, graph learning methods extract relevant features of graphs by taking advantage of machine learning algorithms. In this survey, we present a comprehensive overview on the state-of-the-art of graph learning. Special attention is paid to four categories of existing graph learning methods, including graph signal processing, matrix factorization, random walk, and deep learning. Major models and algorithms under these categories are reviewed respectively. We examine graph learning applications in areas such as text, images, science, knowledge graphs, and combinatorial optimization. In addition, we discuss several promising research directions in this field.

* IEEE Transactions on Artificial Intelligence (2021) 
* 19 pages, 6 figures 

  Access Paper or Ask Questions

An Introduction to Robust Graph Convolutional Networks

Mar 27, 2021
Mehrnaz Najafi, Philip S. Yu

Graph convolutional neural networks (GCNs) generalize tradition convolutional neural networks (CNNs) from low-dimensional regular graphs (e.g., image) to high dimensional irregular graphs (e.g., text documents on word embeddings). Due to inevitable faulty data collection instruments, deceptive data manipulation, or other system errors, the data might be error-contaminated. Even a small amount of error such as noise can compromise the ability of GCNs and render them inadmissible to a large extent. The key challenge is how to effectively and efficiently employ GCNs in the presence of erroneous data. In this paper, we propose a novel Robust Graph Convolutional Neural Networks for possible erroneous single-view or multi-view data where data may come from multiple sources. By incorporating an extra layers via Autoencoders into traditional graph convolutional networks, we characterize and handle typical error models explicitly. Experimental results on various real-world datasets demonstrate the superiority of the proposed model over the baseline methods and its robustness against different types of error.


  Access Paper or Ask Questions

Approximating Instance-Dependent Noise via Instance-Confidence Embedding

Mar 25, 2021
Yivan Zhang, Masashi Sugiyama

Label noise in multiclass classification is a major obstacle to the deployment of learning systems. However, unlike the widely used class-conditional noise (CCN) assumption that the noisy label is independent of the input feature given the true label, label noise in real-world datasets can be aleatory and heavily dependent on individual instances. In this work, we investigate the instance-dependent noise (IDN) model and propose an efficient approximation of IDN to capture the instance-specific label corruption. Concretely, noting the fact that most columns of the IDN transition matrix have only limited influence on the class-posterior estimation, we propose a variational approximation that uses a single-scalar confidence parameter. To cope with the situation where the mapping from the instance to its confidence value could vary significantly for two adjacent instances, we suggest using instance embedding that assigns a trainable parameter to each instance. The resulting instance-confidence embedding (ICE) method not only performs well under label noise but also can effectively detect ambiguous or mislabeled instances. We validate its utility on various image and text classification tasks.


  Access Paper or Ask Questions

A Novel Paper Recommendation Method Empowered by Knowledge Graph: for Research Beginners

Mar 16, 2021
Bangchao Wang, Ziyang Weng, Yanping Wang

Searching for papers from different academic databases is the most commonly used method by research beginners to obtain cross-domain technical solutions. However, it is usually inefficient and sometimes even useless because traditional search methods neither consider knowledge heterogeneity in different domains nor build the bottom layer of search, including but not limited to the characteristic description text of target solutions and solutions to be excluded. To alleviate this problem, a novel paper recommendation method is proposed herein by introducing "master-slave" domain knowledge graphs, which not only help users express their requirements more accurately but also helps the recommendation system better express knowledge. Specifically, it is not restricted by the cold start problem and is a challenge-oriented method. To identify the rationality and usefulness of the proposed method, we selected two cross-domains and three different academic databases for verification. The experimental results demonstrate the feasibility of obtaining new technical papers in the cross-domain scenario by research beginners using the proposed method. Further, a new research paradigm for research beginners in the early stages is proposed herein.


  Access Paper or Ask Questions

Transformer-based Conditional Variational Autoencoder for Controllable Story Generation

Jan 04, 2021
Le Fang, Tao Zeng, Chaochun Liu, Liefeng Bo, Wen Dong, Changyou Chen

We investigate large-scale latent variable models (LVMs) for neural story generation -- an under-explored application for open-domain long text -- with objectives in two threads: generation effectiveness and controllability. LVMs, especially the variational autoencoder (VAE), have achieved both effective and controllable generation through exploiting flexible distributional latent representations. Recently, Transformers and its variants have achieved remarkable effectiveness without explicit latent representation learning, thus lack satisfying controllability in generation. In this paper, we advocate to revive latent variable modeling, essentially the power of representation learning, in the era of Transformers to enhance controllability without hurting state-of-the-art generation effectiveness. Specifically, we integrate latent representation vectors with a Transformer-based pre-trained architecture to build conditional variational autoencoder (CVAE). Model components such as encoder, decoder and the variational posterior are all built on top of pre-trained language models -- GPT2 specifically in this paper. Experiments demonstrate state-of-the-art conditional generation ability of our model, as well as its excellent representation learning capability and controllability.


  Access Paper or Ask Questions

Towards a Universal Continuous Knowledge Base

Dec 25, 2020
Gang Chen, Maosong Sun, Yang Liu

In artificial intelligence, knowledge is the information required by an intelligent system to accomplish tasks. While traditional knowledge bases use discrete, symbolic representations, detecting knowledge encoded in the continuous representations learned from data has received increasing attention recently. In this work, we propose a method for building a continuous knowledge base that can store knowledge imported from multiple, diverse neural networks. The key idea of our approach is to define an interface for each neural network and cast knowledge transferring as a function simulation problem. Preliminary experiments on text classification show promising results: we first import the knowledge encoded in an RNN model and a CNN model to the knowledge base, from which the fused knowledge is exported back to the RNN model, achieving a higher classification accuracy than the original RNN model. With the continuous knowledge base, it is also easy to achieve knowledge distillation and transfer learning. Our work opens the door to building a universal continuous knowledge base to collect, store, and organize all continuous knowledge encoded in different neural networks trained for different AI tasks.


  Access Paper or Ask Questions

<<
1400
1401
1402
1403
1404
1405
1406
1407
1408
1409
1410
1411
1412
>>