Some news headlines mislead readers with overrated or false information, and identifying them in advance will better assist readers in choosing proper news stories to consume. This research introduces million-scale pairs of news headline and body text dataset with incongruity label, which can uniquely be utilized for detecting news stories with misleading headlines. On this dataset, we develop two neural networks with hierarchical architectures that model a complex textual representation of news articles and measure the incongruity between the headline and the body text. We also present a data augmentation method that dramatically reduces the text input size a model handles by independently investigating each paragraph of news stories, which further boosts the performance. Our experiments and qualitative evaluations demonstrate that the proposed methods outperform existing approaches and efficiently detect news stories with misleading headlines in the real world.
The recognition, involvement, and description of main actors influences the story line of the whole text. This is of higher importance as the text per se represents a flow of words and expressions that once it is read it is lost. In this respect, the understanding of a text and moreover on how the actor exactly behaves is not only a major concern: as human beings try to store a given input on short-term memory while associating diverse aspects and actors with incidents, the following approach represents a virtual architecture, where collocations are concerned and taken as the associative completion of the actors' acting. Once that collocations are discovered, they become managed in separated memory blocks broken down by the actors. As for human beings, the memory blocks refer to associative mind-maps. We then present several priority functions to represent the actual temporal situation inside a mind-map to enable the user to reconstruct the recent events from the discovered temporal results.
The goal of text-to-image synthesis is to generate a visually realistic image that matches a given text description. In practice, the captions annotated by humans for the same image have large variance in terms of contents and the choice of words. The linguistic discrepancy between the captions of the identical image leads to the synthetic images deviating from the ground truth. To address this issue, we propose a contrastive learning approach to improve the quality and enhance the semantic consistency of synthetic images. In the pre-training stage, we utilize the contrastive learning approach to learn the consistent textual representations for the captions corresponding to the same image. Furthermore, in the following stage of GAN training, we employ the contrastive learning method to enhance the consistency between the generated images from the captions related to the same image. We evaluate our approach over two popular text-to-image synthesis models, AttnGAN and DM-GAN, on datasets CUB and COCO, respectively. Experimental results have shown that our approach can effectively improve the quality of synthetic images in terms of three metrics: IS, FID and R-precision. Especially, on the challenging COCO dataset, our approach boosts the FID significantly by 29.60% over AttnGAn and by 21.96% over DM-GAN.
Conversational text-to-SQL aims at converting multi-turn natural language queries into their corresponding SQL representations. One of the most intractable problem of conversational text-to-SQL is modeling the semantics of multi-turn queries and gathering proper information required for the current query. This paper shows that explicit modeling the semantic changes by adding each turn and the summarization of the whole context can bring better performance on converting conversational queries into SQLs. In particular, we propose two conversational modeling tasks in both turn grain and conversation grain. These two tasks simply work as auxiliary training tasks to help with multi-turn conversational semantic parsing. We conducted empirical studies and achieve new state-of-the-art results on large-scale open-domain conversational text-to-SQL dataset. The results demonstrate that the proposed mechanism significantly improves the performance of multi-turn semantic parsing.
Recent advancements in data-to-text generation largely take on the form of neural end-to-end systems. Efforts have been dedicated to improving text generation systems by changing the order of training samples in a process known as curriculum learning. Past research on sequence-to-sequence learning showed that curriculum learning helps to improve both the performance and convergence speed. In this work, we delve into the same idea surrounding the training samples consisting of structured data and text pairs, where at each update, the curriculum framework selects training samples based on the model's competence. Specifically, we experiment with various difficulty metrics and put forward a soft edit distance metric for ranking training samples. Our benchmarks show faster convergence speed where training time is reduced by 38.7% and performance is boosted by 4.84 BLEU.
Continual learning has become increasingly important as it enables NLP models to constantly learn and gain knowledge over time. Previous continual learning methods are mainly designed to preserve knowledge from previous tasks, without much emphasis on how to well generalize models to new tasks. In this work, we propose an information disentanglement based regularization method for continual learning on text classification. Our proposed method first disentangles text hidden spaces into representations that are generic to all tasks and representations specific to each individual task, and further regularizes these representations differently to better constrain the knowledge required to generalize. We also introduce two simple auxiliary tasks: next sentence prediction and task-id prediction, for learning better generic and specific representation spaces. Experiments conducted on large-scale benchmarks demonstrate the effectiveness of our method in continual text classification tasks with various sequences and lengths over state-of-the-art baselines. We have publicly released our code at https://github.com/GT-SALT/IDBR.
Data augmentation is proven to be effective in many NLU tasks, especially for those suffering from data scarcity. In this paper, we present a powerful and easy to deploy text augmentation framework, Data Boost, which augments data through reinforcement learning guided conditional generation. We evaluate Data Boost on three diverse text classification tasks under five different classifier architectures. The result shows that Data Boost can boost the performance of classifiers especially in low-resource data scenarios. For instance, Data Boost improves F1 for the three tasks by 8.7% on average when given only 10% of the whole data for training. We also compare Data Boost with six prior text augmentation methods. Through human evaluations (N=178), we confirm that Data Boost augmentation has comparable quality as the original data with respect to readability and class consistency.
Many applications of text generation require incorporating different constraints to control the semantics or style of generated text. These constraints can be hard (e.g., ensuring certain keywords are included in the output) and soft (e.g., contextualizing the output with the left- or right-hand context). In this paper, we present Energy-based Constrained Decoding with Langevin Dynamics (COLD), a decoding framework which unifies constrained generation as specifying constraints through an energy function, then performing efficient differentiable reasoning over the constraints through gradient-based sampling. COLD decoding is a flexible framework that can be applied directly to off-the-shelf left-to-right language models without the need for any task-specific fine-tuning, as demonstrated through three challenging text generation applications: lexically-constrained generation, abductive reasoning, and counterfactual reasoning. Our experiments on these constrained generation tasks point to the effectiveness of our approach, both in terms of automatic and human evaluation.
In this paper, we explore to conduct text classification with extremely weak supervision, i.e., only relying on the surface text of class names. This is a more challenging setting than the seed-driven weak supervision, which allows a few seed words per class. We opt to attack this problem from a representation learning perspective -- ideal document representations should lead to very close results between clustering and the desired classification. In particular, one can classify the same corpus differently (e.g., based on topics and locations), so document representations must be adaptive to the given class names. We propose a novel framework X-Class to realize it. Specifically, we first estimate comprehensive class representations by incrementally adding the most similar word to each class until inconsistency appears. Following a tailored mixture of class attention mechanisms, we obtain the document representation via a weighted average of contextualized token representations. We then cluster and align the documents to classes with the prior of each document assigned to its nearest class. Finally, we pick the most confident documents from each cluster to train a text classifier. Extensive experiments demonstrate that X-Class can rival and even outperform seed-driven weakly supervised methods on 7 benchmark datasets.