Deep neural networks have been widely used in text classification. However, it is hard to interpret the neural models due to the complicate mechanisms. In this work, we study the interpretability of a variant of the typical text classification model which is based on convolutional operation and max-pooling layer. Two mechanisms: convolution attribution and n-gram feature analysis are proposed to analyse the process procedure for the CNN model. The interpretability of the model is reflected by providing posterior interpretation for neural network predictions. Besides, a multi-sentence strategy is proposed to enable the model to beused in multi-sentence situation without loss of performance and interpret ability. We evaluate the performance of the model on several classification tasks and justify the interpretable performance with some case studies.
Convolutional neural network (CNN) and recurrent neural network (RNN) are two popular architectures used in text classification. Traditional methods to combine the strengths of the two networks rely on streamlining them or concatenating features extracted from them. In this paper, we propose a novel method to keep the strengths of the two networks to a great extent. In the proposed model, a convolutional neural network is applied to learn a 2D weight matrix where each row reflects the importance of each word from different aspects. Meanwhile, we use a bi-directional RNN to process each word and employ a neural tensor layer that fuses forward and backward hidden states to get word representations. In the end, the weight matrix and word representations are combined to obtain the representation in a 2D matrix form for the text. We carry out experiments on a number of datasets for text classification. The experimental results confirm the effectiveness of the proposed method.
In this paper, we present a label transfer model from texts to images for image classification tasks. The problem of image classification is often much more challenging than text classification. On one hand, labeled text data is more widely available than the labeled images for classification tasks. On the other hand, text data tends to have natural semantic interpretability, and they are often more directly related to class labels. On the contrary, the image features are not directly related to concepts inherent in class labels. One of our goals in this paper is to develop a model for revealing the functional relationships between text and image features as to directly transfer intermodal and intramodal labels to annotate the images. This is implemented by learning a transfer function as a bridge to propagate the labels between two multimodal spaces. However, the intermodal label transfers could be undermined by blindly transferring the labels of noisy texts to annotate images. To mitigate this problem, we present an intramodal label transfer process, which complements the intermodal label transfer by transferring the image labels instead when relevant text is absent from the source corpus. In addition, we generalize the inter-modal label transfer to zero-shot learning scenario where there are only text examples available to label unseen classes of images without any positive image examples. We evaluate our algorithm on an image classification task and show the effectiveness with respect to the other compared algorithms.
Topic classification systems on spoken documents usually consist of two modules: an automatic speech recognition (ASR) module to convert speech into text and a text topic classification (TTC) module to predict the topic class from the decoded text. In this paper, instead of using the ASR transcripts, the fusion of deep acoustic and linguistic features is used for topic classification on spoken documents. More specifically, a conventional CTC-based acoustic model (AM) using phonemes as output units is first trained, and the outputs of the layer before the linear phoneme classifier in the trained AM are used as the deep acoustic features of spoken documents. Furthermore, these deep acoustic features are fed to a phoneme-to-word (P2W) module to obtain deep linguistic features. Finally, a local multi-head attention module is proposed to fuse these two types of deep features for topic classification. Experiments conducted on a subset selected from Switchboard corpus show that our proposed framework outperforms the conventional ASR+TTC systems and achieves a 3.13% improvement in ACC.
Recent work has demonstrated the vulnerability of modern text classifiers to universal adversarial attacks, which are input-agnostic sequence of words added to any input instance. Despite being highly successful, the word sequences produced in these attacks are often unnatural, do not carry much semantic meaning, and can be easily distinguished from natural text. In this paper, we develop adversarial attacks that appear closer to natural English phrases and yet confuse classification systems when added to benign inputs. To achieve this, we leverage an adversarially regularized autoencoder (ARAE) to generate triggers and propose a gradient-based search method to output natural text that fools a target classifier. Experiments on two different classification tasks demonstrate the effectiveness of our attacks while also being less identifiable than previous approaches on three simple detection metrics.
Health mentioning classification (HMC) classifies an input text as health mention or not. Figurative and non-health mention of disease words makes the classification task challenging. Learning the context of the input text is the key to this problem. The idea is to learn word representation by its surrounding words and utilize emojis in the text to help improve the classification results. In this paper, we improve the word representation of the input text using adversarial training that acts as a regularizer during fine-tuning of the model. We generate adversarial examples by perturbing the embeddings of the model and then train the model on a pair of clean and adversarial examples. Additionally, we utilize contrastive loss that pushes a pair of clean and perturbed examples close to each other and other examples away in the representation space. We train and evaluate the method on an extended version of the publicly available PHM2017 dataset. Experiments show an improvement of 1.0% over BERT-Large baseline and 0.6% over RoBERTa-Large baseline, whereas 5.8% over the state-of-the-art in terms of F1 score. Furthermore, we provide a brief analysis of the results by utilizing the power of explainable AI.
Estimating the software projects' efforts developed by agile methods is important for project managers or technical leads. It provides a summary as a first view of how many hours and developers are required to complete the tasks. There are research works on automatic predicting the software efforts, including Term Frequency Inverse Document Frequency (TFIDF) as the traditional approach for this problem. Graph Neural Network is a new approach that has been applied in Natural Language Processing for text classification. The advantages of Graph Neural Network are based on the ability to learn information via graph data structure, which has more representations such as the relationships between words compared to approaches of vectorizing sequence of words. In this paper, we show the potential and possible challenges of Graph Neural Network text classification in story point level estimation. By the experiments, we show that the GNN Text Level Classification can achieve as high accuracy as about 80 percent for story points level classification, which is comparable to the traditional approach. We also analyze the GNN approach and point out several current disadvantages that the GNN approach can improve for this problem or other problems in software engineering.
We propose a new deep neural network model and its training scheme for text classification. Our model Sequence-to-convolution Neural Networks(Seq2CNN) consists of two blocks: Sequential Block that summarizes input texts and Convolution Block that receives summary of input and classifies it to a label. Seq2CNN is trained end-to-end to classify various-length texts without preprocessing inputs into fixed length. We also present Gradual Weight Shift(GWS) method that stabilizes training. GWS is applied to our model's loss function. We compared our model with word-based TextCNN trained with different data preprocessing methods. We obtained significant improvement in classification accuracy over word-based TextCNN without any ensemble or data augmentation.
This study proposes a Neural Attentive Bag-of-Entities model, which is a neural network model that performs text classification using entities in a knowledge base. Entities provide unambiguous and relevant semantic signals that are beneficial for capturing semantics in texts. We combine simple high-recall entity detection based on a dictionary, to detect entities in a document, with a novel neural attention mechanism that enables the model to focus on a small number of unambiguous and relevant entities. We tested the effectiveness of our model using two standard text classification datasets (i.e., the 20 Newsgroups and R8 datasets) and a popular factoid question answering dataset based on a trivia quiz game. As a result, our model achieved state-of-the-art results on all datasets. The source code of the proposed model is available online at https://github.com/wikipedia2vec/wikipedia2vec.