Get our free extension to see links to code for papers anywhere online!

Chrome logo Add to Chrome

Firefox logo Add to Firefox

"Text": models, code, and papers

Some Theoretical Insights into Wasserstein GANs

Jun 04, 2020
Gérard Biau, Maxime Sangnier, Ugo Tanielian

Generative Adversarial Networks (GANs) have been successful in producing outstanding results in areas as diverse as image, video, and text generation. Building on these successes, a large number of empirical studies have validated the benefits of the cousin approach called Wasserstein GANs (WGANs), which brings stabilization in the training process. In the present paper, we add a new stone to the edifice by proposing some theoretical advances in the properties of WGANs. First, we properly define the architecture of WGANs in the context of integral probability metrics parameterized by neural networks and highlight some of their basic mathematical features. We stress in particular interesting optimization properties arising from the use of a parametric 1-Lipschitz discriminator. Then, in a statistically-driven approach, we study the convergence of empirical WGANs as the sample size tends to infinity, and clarify the adversarial effects of the generator and the discrimi-nator by underlining some trade-off properties. These features are finally illustrated with experiments using both synthetic and real-world datasets.

  Access Paper or Ask Questions

Learning Multi-Modal Nonlinear Embeddings: Performance Bounds and an Algorithm

Jun 03, 2020
Semih Kaya, Elif Vural

While many approaches exist in the literature to learn representations for data collections in multiple modalities, the generalizability of the learnt representations to previously unseen data is a largely overlooked subject. In this work, we first present a theoretical analysis of learning multi-modal nonlinear embeddings in a supervised setting. Our performance bounds indicate that for successful generalization in multi-modal classification and retrieval problems, the regularity of the interpolation functions extending the embedding to the whole data space is as important as the between-class separation and cross-modal alignment criteria. We then propose a multi-modal nonlinear representation learning algorithm that is motivated by these theoretical findings, where the embeddings of the training samples are optimized jointly with the Lipschitz regularity of the interpolators. Experimental comparison to recent multi-modal and single-modal learning algorithms suggests that the proposed method yields promising performance in multi-modal image classification and cross-modal image-text retrieval applications.

  Access Paper or Ask Questions

A frame semantics based approach to comparative study of digitized corpus

May 29, 2020
Abdelaziz Lakhfif, Mohamed Tayeb Laskri

in this paper, we present a corpus linguistics based approach applied to analyzing digitized classical multilingual novels and narrative texts, from a semantic point of view. Digitized novels such as "the hobbit (Tolkien J. R. R., 1937)" and "the hound of the Baskervilles (Doyle A. C. 1901-1902)", which were widely translated to dozens of languages, provide rich materials for analyzing languages differences from several perspectives and within a number of disciplines like linguistics, philosophy and cognitive science. Taking motion events conceptualization as a case study, this paper, focus on the morphologic, syntactic, and semantic annotation process of English-Arabic aligned corpus created from a digitized novels, in order to re-examine the linguistic encodings of motion events in English and Arabic in terms of Frame Semantics. The present study argues that differences in motion events conceptualization across languages can be described with frame structure and frame-to-frame relations.

* Proceedings of the 7th International Symposium ISKO-Maghreb Knowledge Organization in the Perspective of Digital Humanities: Research & Applications November 25th & 26th, 2018, pp. 217-223, Bejaia, Algeria 

  Access Paper or Ask Questions

Improving Named Entity Recognition in Tor Darknet with Local Distance Neighbor Feature

May 18, 2020
Mhd Wesam Al-Nabki, Francisco Jañez-Martino, Roberto A. Vasco-Carofilis, Eduardo Fidalgo, Javier Velasco-Mata

Name entity recognition in noisy user-generated texts is a difficult task usually enhanced by incorporating an external resource of information, such as gazetteers. However, gazetteers are task-specific, and they are expensive to build and maintain. This paper adopts and improves the approach of Aguilar et al. by presenting a novel feature, called Local Distance Neighbor, which substitutes gazetteers. We tested the new approach on the W-NUT-2017 dataset, obtaining state-of-the-art results for the Group, Person and Product categories of Named Entities. Next, we added 851 manually labeled samples to the W-NUT-2017 dataset to account for named entities in the Tor Darknet related to weapons and drug selling. Finally, our proposal achieved an entity and surface F1 scores of 52.96% and 50.57% on this extended dataset, demonstrating its usefulness for Law Enforcement Agencies to detect named entities in the Tor hidden services.

* 2 pages, 1 figure, to be published in conference JNIC 2020 

  Access Paper or Ask Questions

On the Robustness of Language Encoders against Grammatical Errors

May 12, 2020
Fan Yin, Quanyu Long, Tao Meng, Kai-Wei Chang

We conduct a thorough study to diagnose the behaviors of pre-trained language encoders (ELMo, BERT, and RoBERTa) when confronted with natural grammatical errors. Specifically, we collect real grammatical errors from non-native speakers and conduct adversarial attacks to simulate these errors on clean text data. We use this approach to facilitate debugging models on downstream applications. Results confirm that the performance of all tested models is affected but the degree of impact varies. To interpret model behaviors, we further design a linguistic acceptability task to reveal their abilities in identifying ungrammatical sentences and the position of errors. We find that fixed contextual encoders with a simple classifier trained on the prediction of sentence correctness are able to locate error positions. We also design a cloze test for BERT and discover that BERT captures the interaction between errors and specific tokens in context. Our results shed light on understanding the robustness and behaviors of language encoders against grammatical errors.

* ACL 2020 

  Access Paper or Ask Questions

Hooks in the Headline: Learning to Generate Headlines with Controlled Styles

Apr 30, 2020
Di Jin, Zhijing Jin, Joey Tianyi Zhou, Lisa Orii, Peter Szolovits

Current summarization systems only produce plain, factual headlines, but do not meet the practical needs of creating memorable titles to increase exposure. We propose a new task, Stylistic Headline Generation (SHG), to enrich the headlines with three style options (humor, romance and clickbait), in order to attract more readers. With no style-specific article-headline pair (only a standard headline summarization dataset and mono-style corpora), our method TitleStylist generates style-specific headlines by combining the summarization and reconstruction tasks into a multitasking framework. We also introduced a novel parameter sharing scheme to further disentangle the style from the text. Through both automatic and human evaluation, we demonstrate that TitleStylist can generate relevant, fluent headlines with three target styles: humor, romance, and clickbait. The attraction score of our model generated headlines surpasses that of the state-of-the-art summarization model by 9.68%, and even outperforms human-written references.

* ACL 2020 

  Access Paper or Ask Questions

Scheduled DropHead: A Regularization Method for Transformer Models

Apr 28, 2020
Wangchunshu Zhou, Tao Ge, Ke Xu, Furu Wei, Ming Zhou

In this paper, we introduce DropHead, a structured dropout method specifically designed for regularizing the multi-head attention mechanism, which is a key component of transformer, a state-of-the-art model for various NLP tasks. In contrast to the conventional dropout mechanisms which randomly drop units or connections, the proposed DropHead is a structured dropout method. It drops entire attention-heads during training and It prevents the multi-head attention model from being dominated by a small portion of attention heads while also reduces the risk of overfitting the training data, thus making use of the multi-head attention mechanism more efficiently. Motivated by recent studies about the learning dynamic of the multi-head attention mechanism, we propose a specific dropout rate schedule to adaptively adjust the dropout rate of DropHead and achieve better regularization effect. Experimental results on both machine translation and text classification benchmark datasets demonstrate the effectiveness of the proposed approach.

  Access Paper or Ask Questions

MCEN: Bridging Cross-Modal Gap between Cooking Recipes and Dish Images with Latent Variable Model

Apr 02, 2020
Han Fu, Rui Wu, Chenghao Liu, Jianling Sun

Nowadays, driven by the increasing concern on diet and health, food computing has attracted enormous attention from both industry and research community. One of the most popular research topics in this domain is Food Retrieval, due to its profound influence on health-oriented applications. In this paper, we focus on the task of cross-modal retrieval between food images and cooking recipes. We present Modality-Consistent Embedding Network (MCEN) that learns modality-invariant representations by projecting images and texts to the same embedding space. To capture the latent alignments between modalities, we incorporate stochastic latent variables to explicitly exploit the interactions between textual and visual features. Importantly, our method learns the cross-modal alignments during training but computes embeddings of different modalities independently at inference time for the sake of efficiency. Extensive experimental results clearly demonstrate that the proposed MCEN outperforms all existing approaches on the benchmark Recipe1M dataset and requires less computational cost.

* Accepted to CVPR 2020 

  Access Paper or Ask Questions

Detection in Crowded Scenes: One Proposal, Multiple Predictions

Mar 20, 2020
Xuangeng Chu, Anlin Zheng, Xiangyu Zhang, Jian Sun

We propose a simple yet effective proposal-based object detector, aiming at detecting highly-overlapped instances in crowded scenes. The key of our approach is to let each proposal predict a set of correlated instances rather than a single one in previous proposal-based frameworks. Equipped with new techniques such as EMD Loss and Set NMS, our detector can effectively handle the difficulty of detecting highly overlapped objects. On a FPN-Res50 baseline, our detector can obtain 4.9\% AP gains on challenging CrowdHuman dataset and 1.0\% $\text{MR}^{-2}$ improvements on CityPersons dataset, without bells and whistles. Moreover, on less crowed datasets like COCO, our approach can still achieve moderate improvement, suggesting the proposed method is robust to crowdedness. Code and pre-trained models will be released at

* 12 pages; 5 figures; 10 tables 

  Access Paper or Ask Questions