"Art is the lie that enables us to realize the truth." - Pablo Picasso. For centuries, humans have dedicated themselves to producing arts to convey their imagination. The advancement in technology and deep learning in particular, has caught the attention of many researchers trying to investigate whether art generation is possible by computers and algorithms. Using generative adversarial networks (GANs), applications such as synthesizing photorealistic human faces and creating captions automatically from images were realized. This survey takes a comprehensive look at the recent works using GANs for generating visual arts, music, and literary text. A performance comparison and description of the various GAN architecture are also presented. Finally, some of the key challenges in art generation using GANs are highlighted along with recommendations for future work.
The increasingly rapid spread of information about COVID-19 on the web calls for automatic measures of quality assurance. In that context, we check the credibility of news content using selected linguistic features. We present two empirical studies to evaluate the usability of graphical interfaces that offer such credibility assessment. In a moderated qualitative interview with six participants, we identify rating scale, sub-criteria and algorithm authorship as important predictors of the usability. A subsequent quantitative online survey with 50 participants reveals a conflict between transparency and conciseness in the interface design, as well as a perceived hierarchy of metadata: the authorship of a news text is more important than the authorship of the credibility algorithm used to assess the content quality. Finally, we make suggestions for future research, such as proactively documenting credibility-related metadata for Natural Language Processing and Language Technology services and establishing an explicit hierarchical taxonomy of usability predictors for automatic credibility assessment.
In zero-shot multilingual extractive text summarization, a model is typically trained on English summarization dataset and then applied on summarization datasets of other languages. Given English gold summaries and documents, sentence-level labels for extractive summarization are usually generated using heuristics. However, these monolingual labels created on English datasets may not be optimal on datasets of other languages, for that there is the syntactic or semantic discrepancy between different languages. In this way, it is possible to translate the English dataset to other languages and obtain different sets of labels again using heuristics. To fully leverage the information of these different sets of labels, we propose NLSSum (Neural Label Search for Summarization), which jointly learns hierarchical weights for these different sets of labels together with our summarization model. We conduct multilingual zero-shot summarization experiments on MLSUM and WikiLingua datasets, and we achieve state-of-the-art results using both human and automatic evaluations across these two datasets.
Text recognition has attracted considerable research interests because of its various applications. The cutting-edge text recognition methods are based on attention mechanisms. However, most of attention methods usually suffer from serious alignment problem due to its recurrency alignment operation, where the alignment relies on historical decoding results. To remedy this issue, we propose a decoupled attention network (DAN), which decouples the alignment operation from using historical decoding results. DAN is an effective, flexible and robust end-to-end text recognizer, which consists of three components: 1) a feature encoder that extracts visual features from the input image; 2) a convolutional alignment module that performs the alignment operation based on visual features from the encoder; and 3) a decoupled text decoder that makes final prediction by jointly using the feature map and attention maps. Experimental results show that DAN achieves state-of-the-art performance on multiple text recognition tasks, including offline handwritten text recognition and regular/irregular scene text recognition.
Braille is the only system to visually impaired people for reading and writing. However, general people cannot read Braille. So, teachers and relatives find it hard to assist them with learning. Almost every major language has software solutions for this translation purpose. However, in Bengali there is an absence of this useful tool. Here, we propose Braille to Text Translator, which takes image of these tactile alphabets, and translates them to plain text. Image deterioration, scan-time page rotation, and braille dot deformation are the principal issues in this scheme. All of these challenges are directly checked using special image processing and geometric structure analysis. The technique yields 97.25% accuracy in recognizing Braille characters.
Short text classi cation is a method for classifying short sentence with prede ned labels. However, short text is limited in shortness in text length that leads to a challenging problem of sparse features. Most of existing methods treat each short sentences as independently and identically distributed (IID), local context only in the sentence itself is focused and the relational information between sentences are lost. To overcome these limitations, we propose a PathWalk model that combine the strength of graph networks and short sentences to solve the sparseness of short text. Experimental results on four different available datasets show that our PathWalk method achieves the state-of-the-art results, demonstrating the efficiency and robustness of graph networks for short text classification.
Graph Neural Networks (GNNs) are de facto node classification models in graph structured data. However, during testing-time, these algorithms assume no data shift, i.e., $\Pr_\text{train}(X,Y) = \Pr_\text{test}(X,Y)$. Domain adaption methods can be adopted for data shift, yet most of them are designed to only encourage similar feature distribution between source and target data. Conditional shift on classes can still affect such adaption. Fortunately, graph yields graph homophily across different data distributions. In response, we propose Shift-Robust Node Classification (SRNC) to address these limitations. We introduce an unsupervised cluster GNN on target graph to group the similar nodes by graph homophily. An adversarial loss with label information on source graph is used upon clustering objective. Then a shift-robust classifier is optimized on training graph and adversarial samples on target graph, which are generated by cluster GNN. We conduct experiments on both open-set shift and representation-shift, which demonstrates the superior accuracy of SRNC on generalizing to test graph with data shift. SRNC is consistently better than previous SoTA domain adaption algorithm on graph that progressively use model predictions on target graph for training.
Text-to-image retrieval is an essential task in multi-modal information retrieval, i.e. retrieving relevant images from a large and unlabelled image dataset given textual queries. In this paper, we propose VisualSparta, a novel text-to-image retrieval model that shows substantial improvement over existing models on both accuracy and efficiency. We show that VisualSparta is capable of outperforming all previous scalable methods in MSCOCO and Flickr30K. It also shows substantial retrieving speed advantages, i.e. for an index with 1 million images, VisualSparta gets over 391x speed up compared to standard vector search. Experiments show that this speed advantage even gets bigger for larger datasets because VisualSparta can be efficiently implemented as an inverted index. To the best of our knowledge, VisualSparta is the first transformer-based text-to-image retrieval model that can achieve real-time searching for very large dataset, with significant accuracy improvement compared to previous state-of-the-art methods.
Qualitative causal relationships compactly express the direction, dependency, temporal constraints, and monotonicity constraints of discrete or continuous interactions in the world. In everyday or academic language, we may express interactions between quantities (e.g., sleep decreases stress), between discrete events or entities (e.g., a protein inhibits another protein's transcription), or between intentional or functional factors (e.g., hospital patients pray to relieve their pain). Extracting and representing these diverse causal relations are critical for cognitive systems that operate in domains spanning from scientific discovery to social science. This paper presents a transformer-based NLP architecture that jointly extracts knowledge graphs including (1) variables or factors described in language, (2) qualitative causal relationships over these variables, (3) qualifiers and magnitudes that constrain these causal relationships, and (4) word senses to localize each extracted node within a large ontology. We do not claim that our transformer-based architecture is itself a cognitive system; however, we provide evidence of its accurate knowledge graph extraction in real-world domains and the practicality of its resulting knowledge graphs for cognitive systems that perform graph-based reasoning. We demonstrate this approach and include promising results in two use cases, processing textual inputs from academic publications, news articles, and social media.
Conditional text generation often requires lexical constraints, i.e., which words should or shouldn't be included in the output text. While the dominant recipe for conditional text generation has been large-scale pretrained language models that are finetuned on the task-specific training data, such models do not learn to follow the underlying constraints reliably, even when supervised with large amounts of task-specific examples. We propose NeuroLogic Decoding, a simple yet effective algorithm that enables neural language models -- supervised or not -- to generate fluent text while satisfying complex lexical constraints. Our approach is powerful yet efficient. It handles any set of lexical constraints that is expressible under predicate logic, while its asymptotic runtime is equivalent to conventional beam search. Empirical results on four benchmarks show that NeuroLogic Decoding outperforms previous approaches, including algorithms that handle a subset of our constraints. Moreover, we find that unsupervised models with NeuroLogic Decoding often outperform supervised models with conventional decoding, even when the latter is based on considerably larger networks. Our results suggest the limit of large-scale neural networks for fine-grained controllable generation and the promise of inference-time algorithms.