Get our free extension to see links to code for papers anywhere online!

Chrome logo Add to Chrome

Firefox logo Add to Firefox

"Text": models, code, and papers

MACRONYM: A Large-Scale Dataset for Multilingual and Multi-Domain Acronym Extraction

Feb 19, 2022
Amir Pouran Ben Veyseh, Nicole Meister, Seunghyun Yoon, Rajiv Jain, Franck Dernoncourt, Thien Huu Nguyen

Acronym extraction is the task of identifying acronyms and their expanded forms in texts that is necessary for various NLP applications. Despite major progress for this task in recent years, one limitation of existing AE research is that they are limited to the English language and certain domains (i.e., scientific and biomedical). As such, challenges of AE in other languages and domains is mainly unexplored. Lacking annotated datasets in multiple languages and domains has been a major issue to hinder research in this area. To address this limitation, we propose a new dataset for multilingual multi-domain AE. Specifically, 27,200 sentences in 6 typologically different languages and 2 domains, i.e., Legal and Scientific, is manually annotated for AE. Our extensive experiments on the proposed dataset show that AE in different languages and different learning settings has unique challenges, emphasizing the necessity of further research on multilingual and multi-domain AE.

  Access Paper or Ask Questions

Cross-Modal Common Representation Learning with Triplet Loss Functions

Feb 16, 2022
Felix Ott, David Rügamer, Lucas Heublein, Bernd Bischl, Christopher Mutschler

Common representation learning (CRL) learns a shared embedding between two or more modalities to improve in a given task over using only one of the modalities. CRL from different data types such as images and time-series data (e.g., audio or text data) requires a deep metric learning loss that minimizes the distance between the modality embeddings. In this paper, we propose to use the triplet loss, which uses positive and negative identities to create sample pairs with different labels, for CRL between image and time-series modalities. By adapting the triplet loss for CRL, higher accuracy in the main (time-series classification) task can be achieved by exploiting additional information of the auxiliary (image classification) task. Our experiments on synthetic data and handwriting recognition data from sensor-enhanced pens show an improved classification accuracy, faster convergence, and a better generalizability.

  Access Paper or Ask Questions

On the Context-Free Ambiguity of Emoji: A Data-Driven Study of 1,289 Emojis

Jan 17, 2022
Justyna Czestochowska, Kristina Gligoric, Maxime Peyrard, Yann Mentha, Michal Bien, Andrea Grutter, Anita Auer, Aris Xanthos, Robert West

Emojis come with prepacked semantics making them great candidates to create new forms of more accessible communications. Yet, little is known about how much of this emojis semantic is agreed upon by humans, outside of textual contexts. Thus, we collected a crowdsourced dataset of one-word emoji descriptions for 1,289 emojis presented to participants with no surrounding text. The emojis and their interpretations were then examined for ambiguity. We find that with 30 annotations per emoji, 16 emojis (1.2%) are completely unambiguous, whereas 55 emojis (4.3%) are so ambiguous that their descriptions are indistinguishable from randomly chosen descriptions. Most of studied emojis are spread out between the two extremes. Furthermore, investigating the ambiguity of different types of emojis, we find that an important factor is the extent to which an emoji has an embedded symbolical meaning drawn from an established code-book of symbols. We conclude by discussing design implications.

  Access Paper or Ask Questions

Integrating Artificial Intelligence and Augmented Reality in Robotic Surgery: An Initial dVRK Study Using a Surgical Education Scenario

Jan 02, 2022
Yonghao Long, Jianfeng Cao, Anton Deguet, Russell H. Taylor, Qi Dou

The demand of competent robot assisted surgeons is progressively expanding, because robot-assisted surgery has become progressively more popular due to its clinical advantages. To meet this demand and provide a better surgical education for surgeon, we develop a novel robotic surgery education system by integrating artificial intelligence surgical module and augmented reality visualization. The artificial intelligence incorporates reinforcement leaning to learn from expert demonstration and then generate 3D guidance trajectory, providing surgical context awareness of the complete surgical procedure. The trajectory information is further visualized in stereo viewer in the dVRK along with other information such as text hint, where the user can perceive the 3D guidance and learn the procedure. The proposed system is evaluated through a preliminary experiment on surgical education task peg-transfer, which proves its feasibility and potential as the next generation of robot-assisted surgery education solution.

  Access Paper or Ask Questions

Point spread function estimation for blind image deblurring problems based on framelet transform

Dec 21, 2021
Reza Parvaz

One of the most important issues in the image processing is the approximation of the image that has been lost due to the blurring process. These types of matters are divided into non-blind and blind problems. The second type of problem is more complex in terms of calculations than the first problems due to the unknown of original image and point spread function estimation. In the present paper, an algorithm based on coarse-to-fine iterative by $l_0-\alpha l_1$ regularization and framelet transform is introduced to approximate the spread function estimation. Framelet transfer improves the restored kernel due to the decomposition of the kernel to different frequencies. Also in the proposed model fraction gradient operator is used instead of ordinary gradient operator. The proposed method is investigated on different kinds of images such as text, face, natural. The output of the proposed method reflects the effectiveness of the proposed algorithm in restoring the images from blind problems.

  Access Paper or Ask Questions

Hateful Memes Challenge: An Enhanced Multimodal Framework

Dec 20, 2021
Aijing Gao, Bingjun Wang, Jiaqi Yin, Yating Tian

Hateful Meme Challenge proposed by Facebook AI has attracted contestants around the world. The challenge focuses on detecting hateful speech in multimodal memes. Various state-of-the-art deep learning models have been applied to this problem and the performance on challenge's leaderboard has also been constantly improved. In this paper, we enhance the hateful detection framework, including utilizing Detectron for feature extraction, exploring different setups of VisualBERT and UNITER models with different loss functions, researching the association between the hateful memes and the sensitive text features, and finally building ensemble method to boost model performance. The AUROC of our fine-tuned VisualBERT, UNITER, and ensemble method achieves 0.765, 0.790, and 0.803 on the challenge's test set, respectively, which beats the baseline models. Our code is available at

  Access Paper or Ask Questions

A Contextual Latent Space Model: Subsequence Modulation in Melodic Sequence

Nov 23, 2021
Taketo Akama

Some generative models for sequences such as music and text allow us to edit only subsequences, given surrounding context sequences, which plays an important part in steering generation interactively. However, editing subsequences mainly involves randomly resampling subsequences from a possible generation space. We propose a contextual latent space model (CLSM) in order for users to be able to explore subsequence generation with a sense of direction in the generation space, e.g., interpolation, as well as exploring variations -- semantically similar possible subsequences. A context-informed prior and decoder constitute the generative model of CLSM, and a context position-informed encoder is the inference model. In experiments, we use a monophonic symbolic music dataset, demonstrating that our contextual latent space is smoother in interpolation than baselines, and the quality of generated samples is superior to baseline models. The generation examples are available online.

* 22nd International Society for Music Information Retrieval Conference (ISMIR), 2021; 8 pages 

  Access Paper or Ask Questions

Self-Supervised Representation Learning: Introduction, Advances and Challenges

Oct 18, 2021
Linus Ericsson, Henry Gouk, Chen Change Loy, Timothy M. Hospedales

Self-supervised representation learning methods aim to provide powerful deep feature learning without the requirement of large annotated datasets, thus alleviating the annotation bottleneck that is one of the main barriers to practical deployment of deep learning today. These methods have advanced rapidly in recent years, with their efficacy approaching and sometimes surpassing fully supervised pre-training alternatives across a variety of data modalities including image, video, sound, text and graphs. This article introduces this vibrant area including key concepts, the four main families of approach and associated state of the art, and how self-supervised methods are applied to diverse modalities of data. We further discuss practical considerations including workflows, representation transferability, and compute cost. Finally, we survey the major open challenges in the field that provide fertile ground for future work.

  Access Paper or Ask Questions

Awakening Latent Grounding from Pretrained Language Models for Semantic Parsing

Sep 22, 2021
Qian Liu, Dejian Yang, Jiahui Zhang, Jiaqi Guo, Bin Zhou, Jian-Guang Lou

Recent years pretrained language models (PLMs) hit a success on several downstream tasks, showing their power on modeling language. To better understand and leverage what PLMs have learned, several techniques have emerged to explore syntactic structures entailed by PLMs. However, few efforts have been made to explore grounding capabilities of PLMs, which are also essential. In this paper, we highlight the ability of PLMs to discover which token should be grounded to which concept, if combined with our proposed erasing-then-awakening approach. Empirical studies on four datasets demonstrate that our approach can awaken latent grounding which is understandable to human experts, even if it is not exposed to such labels during training. More importantly, our approach shows great potential to benefit downstream semantic parsing models. Taking text-to-SQL as a case study, we successfully couple our approach with two off-the-shelf parsers, obtaining an absolute improvement of up to 9.8%.

* Accepted by ACL 2021 Findings. The first three authors contributed equally 

  Access Paper or Ask Questions

VPN: Video Provenance Network for Robust Content Attribution

Sep 21, 2021
Alexander Black, Tu Bui, Simon Jenni, Vishy Swaminathan, John Collomosse

We present VPN - a content attribution method for recovering provenance information from videos shared online. Platforms, and users, often transform video into different quality, codecs, sizes, shapes, etc. or slightly edit its content such as adding text or emoji, as they are redistributed online. We learn a robust search embedding for matching such video, invariant to these transformations, using full-length or truncated video queries. Once matched against a trusted database of video clips, associated information on the provenance of the clip is presented to the user. We use an inverted index to match temporal chunks of video using late-fusion to combine both visual and audio features. In both cases, features are extracted via a deep neural network trained using contrastive learning on a dataset of original and augmented video clips. We demonstrate high accuracy recall over a corpus of 100,000 videos.

* CVMP2021 camera-ready version 

  Access Paper or Ask Questions