Get our free extension to see links to code for papers anywhere online!

Chrome logo Add to Chrome

Firefox logo Add to Firefox

"Text": models, code, and papers

A Novel Context-Aware Multimodal Framework for Persian Sentiment Analysis

Mar 03, 2021
Kia Dashtipour, Mandar Gogate, Erik Cambria, Amir Hussain

Most recent works on sentiment analysis have exploited the text modality. However, millions of hours of video recordings posted on social media platforms everyday hold vital unstructured information that can be exploited to more effectively gauge public perception. Multimodal sentiment analysis offers an innovative solution to computationally understand and harvest sentiments from videos by contextually exploiting audio, visual and textual cues. In this paper, we, firstly, present a first of its kind Persian multimodal dataset comprising more than 800 utterances, as a benchmark resource for researchers to evaluate multimodal sentiment analysis approaches in Persian language. Secondly, we present a novel context-aware multimodal sentiment analysis framework, that simultaneously exploits acoustic, visual and textual cues to more accurately determine the expressed sentiment. We employ both decision-level (late) and feature-level (early) fusion methods to integrate affective cross-modal information. Experimental results demonstrate that the contextual integration of multimodal features such as textual, acoustic and visual features deliver better performance (91.39%) compared to unimodal features (89.24%).

* Accepted in Neurocomputing 

  Access Paper or Ask Questions

Interpretable Multi-Modal Hate Speech Detection

Mar 02, 2021
Prashanth Vijayaraghavan, Hugo Larochelle, Deb Roy

With growing role of social media in shaping public opinions and beliefs across the world, there has been an increased attention to identify and counter the problem of hate speech on social media. Hate speech on online spaces has serious manifestations, including social polarization and hate crimes. While prior works have proposed automated techniques to detect hate speech online, these techniques primarily fail to look beyond the textual content. Moreover, few attempts have been made to focus on the aspects of interpretability of such models given the social and legal implications of incorrect predictions. In this work, we propose a deep neural multi-modal model that can: (a) detect hate speech by effectively capturing the semantics of the text along with socio-cultural context in which a particular hate expression is made, and (b) provide interpretable insights into decisions of our model. By performing a thorough evaluation of different modeling techniques, we demonstrate that our model is able to outperform the existing state-of-the-art hate speech classification approaches. Finally, we show the importance of social and cultural context features towards unearthing clusters associated with different categories of hate.

* ICML Workshop on AI for Social Good, 2019 
* 5 pages, Accepted at the International Conference on Machine Learning AI for Social Good Workshop, Long Beach, United States, 2019 

  Access Paper or Ask Questions

Tensors Fitting Perfectly

Feb 26, 2021
Adam Paszke, Brennan Saeta

Multidimensional arrays (NDArrays) are a central abstraction in modern scientific computing environments. Unfortunately, they can make reasoning about programs harder as the number of different array shapes used in an execution of a program is usually very large, and they rarely appear explicitly in program text. To make things worse, many operators make implicit assumptions about the shapes of their inputs: array addition is commonly enriched with broadcasting semantics, while matrix multiplication assumes that the lengths of contracted dimensions are equal. Because precise reasoning about shapes is crucial to write correct programs using NDArrays, and because shapes are often hard to infer from a quick glance at the program, we developed Tensors Fitting Perfectly, a static analysis tool that reasons about NDArray shapes in Swift for TensorFlow programs by synthesizing a set of shape constraints from an abstract interpretation of the program. It can both (1) check for possible inconsistencies, and (2) provide direct insights about the shapes of intermediate values appearing in the program, including via a mechanism called shape holes. The static analysis works in concert with optional runtime assertions to improve the productivity of program authors.

  Access Paper or Ask Questions

Automated Quality Assessment of Cognitive Behavioral Therapy Sessions Through Highly Contextualized Language Representations

Feb 23, 2021
Nikolaos Flemotomos, Victor R. Martinez, Zhuohao Chen, Torrey A. Creed, David C. Atkins, Shrikanth Narayanan

During a psychotherapy session, the counselor typically adopts techniques which are codified along specific dimensions (e.g., 'displays warmth and confidence', or 'attempts to set up collaboration') to facilitate the evaluation of the session. Those constructs, traditionally scored by trained human raters, reflect the complex nature of psychotherapy and highly depend on the context of the interaction. Recent advances in deep contextualized language models offer an avenue for accurate in-domain linguistic representations which can lead to robust recognition and scoring of such psychotherapy-relevant behavioral constructs, and support quality assurance and supervision. In this work, a BERT-based model is proposed for automatic behavioral scoring of a specific type of psychotherapy, called Cognitive Behavioral Therapy (CBT), where prior work is limited to frequency-based language features and/or short text excerpts which do not capture the unique elements involved in a spontaneous long conversational interaction. The model is trained in a multi-task manner in order to achieve higher interpretability. BERT-based representations are further augmented with available therapy metadata, providing relevant non-linguistic context and leading to consistent performance improvements.

  Access Paper or Ask Questions

Better Robustness by More Coverage: Adversarial Training with Mixup Augmentation for Robust Fine-tuning

Dec 31, 2020
Chenglei Si, Zhengyan Zhang, Fanchao Qi, Zhiyuan Liu, Yasheng Wang, Qun Liu, Maosong Sun

Pre-trained language models (PLMs) fail miserably on adversarial attacks. To improve the robustness, adversarial data augmentation (ADA) has been widely adopted, which attempts to cover more search space of adversarial attacks by adding the adversarial examples during training. However, the number of adversarial examples added by ADA is extremely insufficient due to the enormously large search space. In this work, we propose a simple and effective method to cover much larger proportion of the attack search space, called Adversarial Data Augmentation with Mixup (MixADA). Specifically, MixADA linearly interpolates the representations of pairs of training examples to form new virtual samples, which are more abundant and diverse than the discrete adversarial examples used in conventional ADA. Moreover, to evaluate the robustness of different models fairly, we adopt a challenging setup, which dynamically generates new adversarial examples for each model. In the text classification experiments of BERT and RoBERTa, MixADA achieves significant robustness gains under two strong adversarial attacks and alleviates the performance degradation of ADA on the original data. Our source codes will be released to support further explorations.

* 9 pages 

  Access Paper or Ask Questions

FAWA: Fast Adversarial Watermark Attack on Optical Character Recognition (OCR) Systems

Dec 15, 2020
Lu Chen, Jiao Sun, Wei Xu

Deep neural networks (DNNs) significantly improved the accuracy of optical character recognition (OCR) and inspired many important applications. Unfortunately, OCRs also inherit the vulnerabilities of DNNs under adversarial examples. Different from colorful vanilla images, text images usually have clear backgrounds. Adversarial examples generated by most existing adversarial attacks are unnatural and pollute the background severely. To address this issue, we propose the Fast Adversarial Watermark Attack (FAWA) against sequence-based OCR models in the white-box manner. By disguising the perturbations as watermarks, we can make the resulting adversarial images appear natural to human eyes and achieve a perfect attack success rate. FAWA works with either gradient-based or optimization-based perturbation generation. In both letter-level and word-level attacks, our experiments show that in addition to natural appearance, FAWA achieves a 100% attack success rate with 60% less perturbations and 78% fewer iterations on average. In addition, we further extend FAWA to support full-color watermarks, other languages, and even the OCR accuracy-enhancing mechanism.

* 16 pages, ECML/PKDD 2020 research trace 

  Access Paper or Ask Questions

Multimodal Privacy-preserving Mood Prediction from Mobile Data: A Preliminary Study

Dec 04, 2020
Terrance Liu, Paul Pu Liang, Michal Muszynski, Ryo Ishii, David Brent, Randy Auerbach, Nicholas Allen, Louis-Philippe Morency

Mental health conditions remain under-diagnosed even in countries with common access to advanced medical care. The ability to accurately and efficiently predict mood from easily collectible data has several important implications towards the early detection and intervention of mental health disorders. One promising data source to help monitor human behavior is from daily smartphone usage. However, care must be taken to summarize behaviors without identifying the user through personal (e.g., personally identifiable information) or protected attributes (e.g., race, gender). In this paper, we study behavioral markers or daily mood using a recent dataset of mobile behaviors from high-risk adolescent populations. Using computational models, we find that multimodal modeling of both text and app usage features is highly predictive of daily mood over each modality alone. Furthermore, we evaluate approaches that reliably obfuscate user identity while remaining predictive of daily mood. By combining multimodal representations with privacy-preserving learning, we are able to push forward the performance-privacy frontier as compared to unimodal approaches.

  Access Paper or Ask Questions

On Extending NLP Techniques from the Categorical to the Latent Space: KL Divergence, Zipf's Law, and Similarity Search

Dec 02, 2020
Adam Hare, Yu Chen, Yinan Liu, Zhenming Liu, Christopher G. Brinton

Despite the recent successes of deep learning in natural language processing (NLP), there remains widespread usage of and demand for techniques that do not rely on machine learning. The advantage of these techniques is their interpretability and low cost when compared to frequently opaque and expensive machine learning models. Although they may not be be as performant in all cases, they are often sufficient for common and relatively simple problems. In this paper, we aim to modernize these older methods while retaining their advantages by extending approaches from categorical or bag-of-words representations to word embeddings representations in the latent space. First, we show that entropy and Kullback-Leibler divergence can be efficiently estimated using word embeddings and use this estimation to compare text across several categories. Next, we recast the heavy-tailed distribution known as Zipf's law that is frequently observed in the categorical space to the latent space. Finally, we look to improve the Jaccard similarity measure for sentence suggestion by introducing a new method of identifying similar sentences based on the set cover problem. We compare the performance of this algorithm against several baselines including Word Mover's Distance and the Levenshtein distance.

* 13 pages, 6 figures 

  Access Paper or Ask Questions

Deep Shallow Fusion for RNN-T Personalization

Nov 16, 2020
Duc Le, Gil Keren, Julian Chan, Jay Mahadeokar, Christian Fuegen, Michael L. Seltzer

End-to-end models in general, and Recurrent Neural Network Transducer (RNN-T) in particular, have gained significant traction in the automatic speech recognition community in the last few years due to their simplicity, compactness, and excellent performance on generic transcription tasks. However, these models are more challenging to personalize compared to traditional hybrid systems due to the lack of external language models and difficulties in recognizing rare long-tail words, specifically entity names. In this work, we present novel techniques to improve RNN-T's ability to model rare WordPieces, infuse extra information into the encoder, enable the use of alternative graphemic pronunciations, and perform deep fusion with personalized language models for more robust biasing. We show that these combined techniques result in 15.4%-34.5% relative Word Error Rate improvement compared to a strong RNN-T baseline which uses shallow fusion and text-to-speech augmentation. Our work helps push the boundary of RNN-T personalization and close the gap with hybrid systems on use cases where biasing and entity recognition are crucial.

* To appear at SLT 2021 

  Access Paper or Ask Questions

It's all in the (Sub-)title? Expanding Signal Evaluation in Crowdfunding Research

Oct 27, 2020
Constantin von Selasinsky, Andrew Jay Isaak

Research on crowdfunding success that incorporates CATA (computer-aided text analysis) is quickly advancing to the big leagues (e.g., Parhankangas and Renko, 2017; Anglin et al., 2018; Moss et al., 2018) and is often theoretically based on information asymmetry, social capital, signaling or a combination thereof. Yet, current papers that explore crowdfunding success criteria fail to take advantage of the full breadth of signals available and only very few such papers examine technology projects. In this paper, we compare and contrast the strength of the entrepreneur's textual success signals to project backers within this category. Based on a random sample of 1,049 technology projects collected from Kickstarter, we evaluate textual information not only from project titles and descriptions but also from video subtitles. We find that incorporating subtitle information increases the variance explained by the respective models and therefore their predictive capability for funding success. By expanding the information landscape, our work advances the field and paves the way for more fine-grained studies of success signals in crowdfunding and therefore for an improved understanding of investor decision-making in the crowd.

* Proceedings of the Twenty-Eighth European Conference on Information Systems (ECIS2020) 

  Access Paper or Ask Questions