Get our free extension to see links to code for papers anywhere online!

Chrome logo Add to Chrome

Firefox logo Add to Firefox

"Text": models, code, and papers

Cross-modal Common Representation Learning by Hybrid Transfer Network

Jun 24, 2017
Xin Huang, Yuxin Peng, Mingkuan Yuan

DNN-based cross-modal retrieval is a research hotspot to retrieve across different modalities as image and text, but existing methods often face the challenge of insufficient cross-modal training data. In single-modal scenario, similar problem is usually relieved by transferring knowledge from large-scale auxiliary datasets (as ImageNet). Knowledge from such single-modal datasets is also very useful for cross-modal retrieval, which can provide rich general semantic information that can be shared across different modalities. However, it is challenging to transfer useful knowledge from single-modal (as image) source domain to cross-modal (as image/text) target domain. Knowledge in source domain cannot be directly transferred to both two different modalities in target domain, and the inherent cross-modal correlation contained in target domain provides key hints for cross-modal retrieval which should be preserved during transfer process. This paper proposes Cross-modal Hybrid Transfer Network (CHTN) with two subnetworks: Modal-sharing transfer subnetwork utilizes the modality in both source and target domains as a bridge, for transferring knowledge to both two modalities simultaneously; Layer-sharing correlation subnetwork preserves the inherent cross-modal semantic correlation to further adapt to cross-modal retrieval task. Cross-modal data can be converted to common representation by CHTN for retrieval, and comprehensive experiment on 3 datasets shows its effectiveness.

* To appear in the proceedings of 26th International Joint Conference on Artificial Intelligence (IJCAI), Melbourne, Australia, Aug. 19-25, 2017. 8 pages, 2 figures 

  Access Paper or Ask Questions

OCR Context-Sensitive Error Correction Based on Google Web 1T 5-Gram Data Set

Apr 01, 2012
Youssef Bassil, Mohammad Alwani

Since the dawn of the computing era, information has been represented digitally so that it can be processed by electronic computers. Paper books and documents were abundant and widely being published at that time; and hence, there was a need to convert them into digital format. OCR, short for Optical Character Recognition was conceived to translate paper-based books into digital e-books. Regrettably, OCR systems are still erroneous and inaccurate as they produce misspellings in the recognized text, especially when the source document is of low printing quality. This paper proposes a post-processing OCR context-sensitive error correction method for detecting and correcting non-word and real-word OCR errors. The cornerstone of this proposed approach is the use of Google Web 1T 5-gram data set as a dictionary of words to spell-check OCR text. The Google data set incorporates a very large vocabulary and word statistics entirely reaped from the Internet, making it a reliable source to perform dictionary-based error correction. The core of the proposed solution is a combination of three algorithms: The error detection, candidate spellings generator, and error correction algorithms, which all exploit information extracted from Google Web 1T 5-gram data set. Experiments conducted on scanned images written in different languages showed a substantial improvement in the OCR error correction rate. As future developments, the proposed algorithm is to be parallelised so as to support parallel and distributed computing architectures.

* LACSC - Lebanese Association for Computational Sciences,; American Journal of Scientific Research, Issue. 50, February 2012 

  Access Paper or Ask Questions

Biographical: A Semi-Supervised Relation Extraction Dataset

May 02, 2022
Alistair Plum, Tharindu Ranasinghe, Spencer Jones, Constantin Orasan, Ruslan Mitkov

Extracting biographical information from online documents is a popular research topic among the information extraction (IE) community. Various natural language processing (NLP) techniques such as text classification, text summarisation and relation extraction are commonly used to achieve this. Among these techniques, RE is the most common since it can be directly used to build biographical knowledge graphs. RE is usually framed as a supervised machine learning (ML) problem, where ML models are trained on annotated datasets. However, there are few annotated datasets for RE since the annotation process can be costly and time-consuming. To address this, we developed Biographical, the first semi-supervised dataset for RE. The dataset, which is aimed towards digital humanities (DH) and historical research, is automatically compiled by aligning sentences from Wikipedia articles with matching structured data from sources including Pantheon and Wikidata. By exploiting the structure of Wikipedia articles and robust named entity recognition (NER), we match information with relatively high precision in order to compile annotated relation pairs for ten different relations that are important in the DH domain. Furthermore, we demonstrate the effectiveness of the dataset by training a state-of-the-art neural model to classify relation pairs, and evaluate it on a manually annotated gold standard set. Biographical is primarily aimed at training neural models for RE within the domain of digital humanities and history, but as we discuss at the end of this paper, it can be useful for other purposes as well.

* Accepted to ACM SIGIR 2022 

  Access Paper or Ask Questions

TABi: Type-Aware Bi-Encoders for Open-Domain Entity Retrieval

Apr 18, 2022
Megan Leszczynski, Daniel Y. Fu, Mayee F. Chen, Christopher Ré

Entity retrieval--retrieving information about entity mentions in a query--is a key step in open-domain tasks, such as question answering or fact checking. However, state-of-the-art entity retrievers struggle to retrieve rare entities for ambiguous mentions due to biases towards popular entities. Incorporating knowledge graph types during training could help overcome popularity biases, but there are several challenges: (1) existing type-based retrieval methods require mention boundaries as input, but open-domain tasks run on unstructured text, (2) type-based methods should not compromise overall performance, and (3) type-based methods should be robust to noisy and missing types. In this work, we introduce TABi, a method to jointly train bi-encoders on knowledge graph types and unstructured text for entity retrieval for open-domain tasks. TABi leverages a type-enforced contrastive loss to encourage entities and queries of similar types to be close in the embedding space. TABi improves retrieval of rare entities on the Ambiguous Entity Retrieval (AmbER) sets, while maintaining strong overall retrieval performance on open-domain tasks in the KILT benchmark compared to state-of-the-art retrievers. TABi is also robust to incomplete type systems, improving rare entity retrieval over baselines with only 5% type coverage of the training dataset. We make our code publicly available at

* Accepted to Findings of ACL 2022 

  Access Paper or Ask Questions

An analysis of full-size Russian complexly NER labelled corpus of Internet user reviews on the drugs based on deep learning and language neural nets

Apr 30, 2021
Alexander Sboev, Sanna Sboeva, Ivan Moloshnikov, Artem Gryaznov, Roman Rybka, Alexander Naumov, Anton Selivanov, Gleb Rylkov, Viacheslav Ilyin

We present the full-size Russian complexly NER-labeled corpus of Internet user reviews, along with an evaluation of accuracy levels reached on this corpus by a set of advanced deep learning neural networks to extract the pharmacologically meaningful entities from Russian texts. The corpus annotation includes mentions of the following entities: Medication (33005 mentions), Adverse Drug Reaction (1778), Disease (17403), and Note (4490). Two of them - Medication and Disease - comprise a set of attributes. A part of the corpus has the coreference annotation with 1560 coreference chains in 300 documents. Special multi-label model based on a language model and the set of features is developed, appropriate for presented corpus labeling. The influence of the choice of different modifications of the models: word vector representations, types of language models pre-trained for Russian, text normalization styles, and other preliminary processing are analyzed. The sufficient size of our corpus allows to study the effects of particularities of corpus labeling and balancing entities in the corpus. As a result, the state of the art for the pharmacological entity extraction problem for Russian is established on a full-size labeled corpus. In case of the adverse drug reaction (ADR) recognition, it is 61.1 by the F1-exact metric that, as our analysis shows, is on par with the accuracy level for other language corpora with similar characteristics and the ADR representativnes. The evaluated baseline precision of coreference relation extraction on the corpus is 71, that is higher the results reached on other Russian corpora.

  Access Paper or Ask Questions

Imaginative Walks: Generative Random Walk Deviation Loss for Improved Unseen Learning Representation

Apr 20, 2021
Mohamed Elhoseiny, Divyansh Jha, Kai Yi, Ivan Skorokhodov

We propose a novel loss for generative models, dubbed as GRaWD (Generative Random Walk Deviation), to improve learning representations of unexplored visual spaces. Quality learning representation of unseen classes (or styles) is crucial to facilitate novel image generation and better generative understanding of unseen visual classes (a.k.a. Zero-Shot Learning, ZSL). By generating representations of unseen classes from their semantic descriptions, such as attributes or text, Generative ZSL aims at identifying unseen categories discriminatively from seen ones. We define GRaWD by constructing a dynamic graph, including the seen class/style centers and generated samples in the current mini-batch. Our loss starts a random walk probability from each center through visual generations produced from hallucinated unseen classes. As a deviation signal, we encourage the random walk to eventually land after t steps in a feature representation that is hard to classify to any of the seen classes. We show that our loss can improve unseen class representation quality on four text-based ZSL benchmarks on CUB and NABirds datasets and three attribute-based ZSL benchmarks on AWA2, SUN, and aPY datasets. We also study our loss's ability to produce meaningful novel visual art generations on WikiArt dataset. Our experiments and human studies show that our loss can improve StyleGAN1 and StyleGAN2 generation quality, creating novel art that is significantly more preferred. Code will be made available.

  Access Paper or Ask Questions

Robustness Evaluation of Stacked Generative Adversarial Networks using Metamorphic Testing

Mar 04, 2021
Hyejin Park, Taaha Waseem, Wen Qi Teo, Ying Hwei Low, Mei Kuan Lim, Chun Yong Chong

Synthesising photo-realistic images from natural language is one of the challenging problems in computer vision. Over the past decade, a number of approaches have been proposed, of which the improved Stacked Generative Adversarial Network (StackGAN-v2) has proven capable of generating high resolution images that reflect the details specified in the input text descriptions. In this paper, we aim to assess the robustness and fault-tolerance capability of the StackGAN-v2 model by introducing variations in the training data. However, due to the working principle of Generative Adversarial Network (GAN), it is difficult to predict the output of the model when the training data are modified. Hence, in this work, we adopt Metamorphic Testing technique to evaluate the robustness of the model with a variety of unexpected training dataset. As such, we first implement StackGAN-v2 algorithm and test the pre-trained model provided by the original authors to establish a ground truth for our experiments. We then identify a metamorphic relation, from which test cases are generated. Further, metamorphic relations were derived successively based on the observations of prior test results. Finally, we synthesise the results from our experiment of all the metamorphic relations and found that StackGAN-v2 algorithm is susceptible to input images with obtrusive objects, even if it overlaps with the main object minimally, which was not reported by the authors and users of StackGAN-v2 model. The proposed metamorphic relations can be applied to other text-to-image synthesis models to not only verify the robustness but also to help researchers understand and interpret the results made by the machine learning models.

* 8 pages, accepted at the 6th International Workshop on Metamorphic Testing (MET'21) 

  Access Paper or Ask Questions

Bootstrapping Large-Scale Fine-Grained Contextual Advertising Classifier from Wikipedia

Feb 12, 2021
Yiping Jin, Vishakha Kadam, Dittaya Wanvarie

Contextual advertising provides advertisers with the opportunity to target the context which is most relevant to their ads. However, its power cannot be fully utilized unless we can target the page content using fine-grained categories, e.g., "coupe" vs. "hatchback" instead of "automotive" vs. "sport". The widely used advertising content taxonomy (IAB taxonomy) consists of 23 coarse-grained categories and 355 fine-grained categories. With the large number of categories, it becomes very challenging either to collect training documents to build a supervised classification model, or to compose expert-written rules in a rule-based classification system. Besides, in fine-grained classification, different categories often overlap or co-occur, making it harder to classify accurately. In this work, we propose wiki2cat, a method to tackle the problem of large-scaled fine-grained text classification by tapping on Wikipedia category graph. The categories in IAB taxonomy are first mapped to category nodes in the graph. Then the label is propagated across the graph to obtain a list of labeled Wikipedia documents to induce text classifiers. The method is ideal for large-scale classification problems since it does not require any manually-labeled document or hand-curated rules or keywords. The proposed method is benchmarked with various learning-based and keyword-based baselines and yields competitive performance on both publicly available datasets and a new dataset containing more than 300 fine-grained categories.

* Work in progress 

  Access Paper or Ask Questions

A Multitask Deep Learning Approach for User Depression Detection on Sina Weibo

Aug 26, 2020
Yiding Wang, Zhenyi Wang, Chenghao Li, Yilin Zhang, Haizhou Wang

In recent years, due to the mental burden of depression, the number of people who endanger their lives has been increasing rapidly. The online social network (OSN) provides researchers with another perspective for detecting individuals suffering from depression. However, existing studies of depression detection based on machine learning still leave relatively low classification performance, suggesting that there is significant improvement potential for improvement in their feature engineering. In this paper, we manually build a large dataset on Sina Weibo (a leading OSN with the largest number of active users in the Chinese community), namely Weibo User Depression Detection Dataset (WU3D). It includes more than 20,000 normal users and more than 10,000 depressed users, both of which are manually labeled and rechecked by professionals. By analyzing the user's text, social behavior, and posted pictures, ten statistical features are concluded and proposed. In the meantime, text-based word features are extracted using the popular pretrained model XLNet. Moreover, a novel deep neural network classification model, i.e. FusionNet (FN), is proposed and simultaneously trained with the above-extracted features, which are seen as multiple classification tasks. The experimental results show that FusionNet achieves the highest F1-Score of 0.9772 on the test dataset. Compared to existing studies, our proposed method has better classification performance and robustness for unbalanced training samples. Our work also provides a new way to detect depression on other OSN platforms.

* 23 pages, 32 figures 

  Access Paper or Ask Questions

Improving Readability for Automatic Speech Recognition Transcription

Apr 09, 2020
Junwei Liao, Sefik Emre Eskimez, Liyang Lu, Yu Shi, Ming Gong, Linjun Shou, Hong Qu, Michael Zeng

Modern Automatic Speech Recognition (ASR) systems can achieve high performance in terms of recognition accuracy. However, a perfectly accurate transcript still can be challenging to read due to grammatical errors, disfluency, and other errata common in spoken communication. Many downstream tasks and human readers rely on the output of the ASR system; therefore, errors introduced by the speaker and ASR system alike will be propagated to the next task in the pipeline. In this work, we propose a novel NLP task called ASR post-processing for readability (APR) that aims to transform the noisy ASR output into a readable text for humans and downstream tasks while maintaining the semantic meaning of the speaker. In addition, we describe a method to address the lack of task-specific data by synthesizing examples for the APR task using the datasets collected for Grammatical Error Correction (GEC) followed by text-to-speech (TTS) and ASR. Furthermore, we propose metrics borrowed from similar tasks to evaluate performance on the APR task. We compare fine-tuned models based on several open-sourced and adapted pre-trained models with the traditional pipeline method. Our results suggest that finetuned models improve the performance on the APR task significantly, hinting at the potential benefits of using APR systems. We hope that the read, understand, and rewrite approach of our work can serve as a basis that many NLP tasks and human readers can benefit from.

  Access Paper or Ask Questions