Alert button
Picture for Rada Mihalcea

Rada Mihalcea

Alert button

Deception Detection from Linguistic and Physiological Data Streams Using Bimodal Convolutional Neural Networks

Nov 18, 2023
Panfeng Li, Mohamed Abouelenien, Rada Mihalcea

Deception detection is gaining increasing interest due to ethical and security concerns. This paper explores the application of convolutional neural networks for the purpose of multimodal deception detection. We use a dataset built by interviewing 104 subjects about two topics, with one truthful and one falsified response from each subject about each topic. In particular, we make three main contributions. First, we extract linguistic and physiological features from this data to train and construct the neural network models. Second, we propose a fused convolutional neural network model using both modalities in order to achieve an improved overall performance. Third, we compare our new approach with earlier methods designed for multimodal deception detection. We find that our system outperforms regular classification methods; our results indicate the feasibility of using neural networks for deception detection even in the presence of limited amounts of data.

* Submitted to NAACL HLT 2018 
Viaarxiv icon

VERVE: Template-based ReflectiVE Rewriting for MotiVational IntErviewing

Nov 14, 2023
Do June Min, Verónica Pérez-Rosas, Kenneth Resnicow, Rada Mihalcea

Reflective listening is a fundamental skill that counselors must acquire to achieve proficiency in motivational interviewing (MI). It involves responding in a manner that acknowledges and explores the meaning of what the client has expressed in the conversation. In this work, we introduce the task of counseling response rewriting, which transforms non-reflective statements into reflective responses. We introduce VERVE, a template-based rewriting system with paraphrase-augmented training and adaptive template updating. VERVE first creates a template by identifying and filtering out tokens that are not relevant to reflections and constructs a reflective response using the template. Paraphrase-augmented training allows the model to learn less-strict fillings of masked spans, and adaptive template updating helps discover effective templates for rewriting without significantly removing the original content. Using both automatic and human evaluations, we compare our method against text rewriting baselines and show that our framework is effective in turning non-reflective statements into more reflective responses while achieving a good content preservation-reflection style trade-off.

Viaarxiv icon

Bridging the Digital Divide: Performance Variation across Socio-Economic Factors in Vision-Language Models

Nov 09, 2023
Joan Nwatu, Oana Ignat, Rada Mihalcea

Despite the impressive performance of current AI models reported across various tasks, performance reports often do not include evaluations of how these models perform on the specific groups that will be impacted by these technologies. Among the minority groups under-represented in AI, data from low-income households are often overlooked in data collection and model evaluation. We evaluate the performance of a state-of-the-art vision-language model (CLIP) on a geo-diverse dataset containing household images associated with different income values (Dollar Street) and show that performance inequality exists among households of different income levels. Our results indicate that performance for the poorer groups is consistently lower than the wealthier groups across various topics and countries. We highlight insights that can help mitigate these issues and propose actionable steps for economic-level inclusive AI development. Code is available at https://github.com/MichiganNLP/Bridging_the_Digital_Divide.

* EMNLP 2023  
Viaarxiv icon

Language Guided Visual Question Answering: Elevate Your Multimodal Language Model Using Knowledge-Enriched Prompts

Oct 31, 2023
Deepanway Ghosal, Navonil Majumder, Roy Ka-Wei Lee, Rada Mihalcea, Soujanya Poria

Visual question answering (VQA) is the task of answering questions about an image. The task assumes an understanding of both the image and the question to provide a natural language answer. VQA has gained popularity in recent years due to its potential applications in a wide range of fields, including robotics, education, and healthcare. In this paper, we focus on knowledge-augmented VQA, where answering the question requires commonsense knowledge, world knowledge, and reasoning about ideas and concepts not present in the image. We propose a multimodal framework that uses language guidance (LG) in the form of rationales, image captions, scene graphs, etc to answer questions more accurately. We benchmark our method on the multi-choice question-answering task of the A-OKVQA, Science-QA, VSR, and IconQA datasets using CLIP and BLIP models. We show that the use of language guidance is a simple but powerful and effective strategy for visual question answering. Our language guidance improves the performance of CLIP by 7.6% and BLIP-2 by 4.8% in the challenging A-OKVQA dataset. We also observe consistent improvement in performance on the Science-QA, VSR, and IconQA datasets when using the proposed language guidances. The implementation of LG-VQA is publicly available at https:// github.com/declare-lab/LG-VQA.

Viaarxiv icon

HI-TOM: A Benchmark for Evaluating Higher-Order Theory of Mind Reasoning in Large Language Models

Oct 25, 2023
Yinghui He, Yufan Wu, Yilin Jia, Rada Mihalcea, Yulong Chen, Naihao Deng

Theory of Mind (ToM) is the ability to reason about one's own and others' mental states. ToM plays a critical role in the development of intelligence, language understanding, and cognitive processes. While previous work has primarily focused on first and second-order ToM, we explore higher-order ToM, which involves recursive reasoning on others' beliefs. We introduce HI-TOM, a Higher Order Theory of Mind benchmark. Our experimental evaluation using various Large Language Models (LLMs) indicates a decline in performance on higher-order ToM tasks, demonstrating the limitations of current LLMs. We conduct a thorough analysis of different failure cases of LLMs, and share our thoughts on the implications of our findings on the future of NLP.

* Findings of EMNLP 2023  
* Accepted at Findings of EMNLP 2023 
Viaarxiv icon

Enhancing Long-form Text Generation Efficacy with Task-adaptive Tokenization

Oct 23, 2023
Siyang Liu, Naihao Deng, Sahand Sabour, Yilin Jia, Minlie Huang, Rada Mihalcea

Figure 1 for Enhancing Long-form Text Generation Efficacy with Task-adaptive Tokenization
Figure 2 for Enhancing Long-form Text Generation Efficacy with Task-adaptive Tokenization
Figure 3 for Enhancing Long-form Text Generation Efficacy with Task-adaptive Tokenization
Figure 4 for Enhancing Long-form Text Generation Efficacy with Task-adaptive Tokenization

We propose task-adaptive tokenization as a way to adapt the generation pipeline to the specifics of a downstream task and enhance long-form generation in mental health. Inspired by insights from cognitive science, our task-adaptive tokenizer samples variable segmentations from multiple outcomes, with sampling probabilities optimized based on task-specific data. We introduce a strategy for building a specialized vocabulary and introduce a vocabulary merging protocol that allows for the integration of task-specific tokens into the pre-trained model's tokenization step. Through extensive experiments on psychological question-answering tasks in both Chinese and English, we find that our task-adaptive tokenization approach brings a significant improvement in generation performance while using up to 60% fewer tokens. Preliminary experiments point to promising results when using our tokenization approach with very large language models.

* The 2023 Conference on Empirical Methods in Natural Language Processing(EMNLP 2023)  
* Accepted at the main conference of The 2023 Conference on Empirical Methods in Natural Language Processing; 8 pages 
Viaarxiv icon

Enhancing Long-form Text Generation in Mental Health with Task-adaptive Tokenization

Oct 14, 2023
Siyang Liu, Naihao Deng, Sahand Sabour, Yilin Jia, Minlie Huang, Rada Mihalcea

Figure 1 for Enhancing Long-form Text Generation in Mental Health with Task-adaptive Tokenization
Figure 2 for Enhancing Long-form Text Generation in Mental Health with Task-adaptive Tokenization
Figure 3 for Enhancing Long-form Text Generation in Mental Health with Task-adaptive Tokenization
Figure 4 for Enhancing Long-form Text Generation in Mental Health with Task-adaptive Tokenization

We propose task-adaptive tokenization as a way to adapt the generation pipeline to the specifics of a downstream task and enhance long-form generation in mental health. Inspired by insights from cognitive science, our task-adaptive tokenizer samples variable segmentations from multiple outcomes, with sampling probabilities optimized based on task-specific data. We introduce a strategy for building a specialized vocabulary and introduce a vocabulary merging protocol that allows for the integration of task-specific tokens into the pre-trained model's tokenization step. Through extensive experiments on psychological question-answering tasks in both Chinese and English, we find that our task-adaptive tokenization approach brings a significant improvement in generation performance while using up to 60% fewer tokens. Preliminary experiments point to promising results when using our tokenization approach with very large language models.

* The 2023 Conference on Empirical Methods in Natural Language Processing(EMNLP 2023)  
* Accepted at the main conference of The 2023 Conference on Empirical Methods in Natural Language Processing; 8 pages 
Viaarxiv icon

Human Action Co-occurrence in Lifestyle Vlogs using Graph Link Prediction

Sep 22, 2023
Oana Ignat, Santiago Castro, Weiji Li, Rada Mihalcea

Figure 1 for Human Action Co-occurrence in Lifestyle Vlogs using Graph Link Prediction
Figure 2 for Human Action Co-occurrence in Lifestyle Vlogs using Graph Link Prediction
Figure 3 for Human Action Co-occurrence in Lifestyle Vlogs using Graph Link Prediction
Figure 4 for Human Action Co-occurrence in Lifestyle Vlogs using Graph Link Prediction

We introduce the task of automatic human action co-occurrence identification, i.e., determine whether two human actions can co-occur in the same interval of time. We create and make publicly available the ACE (Action Co-occurrencE) dataset, consisting of a large graph of ~12k co-occurring pairs of visual actions and their corresponding video clips. We describe graph link prediction models that leverage visual and textual information to automatically infer if two actions are co-occurring. We show that graphs are particularly well suited to capture relations between human actions, and the learned graph representations are effective for our task and capture novel and relevant information across different data domains. The ACE dataset and the code introduced in this paper are publicly available at https://github.com/MichiganNLP/vlog_action_co-occurrence.

Viaarxiv icon