Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Mihai Burzo

WildQA: In-the-Wild Video Question Answering

Sep 14, 2022

Santiago Castro, Naihao Deng, Pingxuan Huang, Mihai Burzo, Rada Mihalcea

Figure 1 for WildQA: In-the-Wild Video Question Answering

Figure 2 for WildQA: In-the-Wild Video Question Answering

Figure 3 for WildQA: In-the-Wild Video Question Answering

Figure 4 for WildQA: In-the-Wild Video Question Answering

Abstract:Existing video understanding datasets mostly focus on human interactions, with little attention being paid to the "in the wild" settings, where the videos are recorded outdoors. We propose WILDQA, a video understanding dataset of videos recorded in outside settings. In addition to video question answering (Video QA), we also introduce the new task of identifying visual support for a given question and answer (Video Evidence Selection). Through evaluations using a wide range of baseline models, we show that WILDQA poses new challenges to the vision and language research communities. The dataset is available at https://lit.eecs.umich.edu/wildqa/.

* *: Equal contribution; COLING 2022 oral; project webpage: https://lit.eecs.umich.edu/wildqa/

Via

Access Paper or Ask Questions

MUSER: MUltimodal Stress Detection using Emotion Recognition as an Auxiliary Task

May 17, 2021

Yiqun Yao, Michalis Papakostas, Mihai Burzo, Mohamed Abouelenien, Rada Mihalcea

Figure 1 for MUSER: MUltimodal Stress Detection using Emotion Recognition as an Auxiliary Task

Figure 2 for MUSER: MUltimodal Stress Detection using Emotion Recognition as an Auxiliary Task

Figure 3 for MUSER: MUltimodal Stress Detection using Emotion Recognition as an Auxiliary Task

Figure 4 for MUSER: MUltimodal Stress Detection using Emotion Recognition as an Auxiliary Task

Abstract:The capability to automatically detect human stress can benefit artificial intelligent agents involved in affective computing and human-computer interaction. Stress and emotion are both human affective states, and stress has proven to have important implications on the regulation and expression of emotion. Although a series of methods have been established for multimodal stress detection, limited steps have been taken to explore the underlying inter-dependence between stress and emotion. In this work, we investigate the value of emotion recognition as an auxiliary task to improve stress detection. We propose MUSER -- a transformer-based model architecture and a novel multi-task learning algorithm with speed-based dynamic sampling strategy. Evaluations on the Multimodal Stressed Emotion (MuSE) dataset show that our model is effective for stress detection with both internal and external auxiliary tasks, and achieves state-of-the-art results.

* NAACL 2021 accepted

Via

Access Paper or Ask Questions

MuSE-ing on the Impact of Utterance Ordering On Crowdsourced Emotion Annotations

Mar 27, 2019

Mimansa Jaiswal, Zakaria Aldeneh, Cristian-Paul Bara, Yuanhang Luo, Mihai Burzo, Rada Mihalcea, Emily Mower Provost

Figure 1 for MuSE-ing on the Impact of Utterance Ordering On Crowdsourced Emotion Annotations

Figure 2 for MuSE-ing on the Impact of Utterance Ordering On Crowdsourced Emotion Annotations

Figure 3 for MuSE-ing on the Impact of Utterance Ordering On Crowdsourced Emotion Annotations

Figure 4 for MuSE-ing on the Impact of Utterance Ordering On Crowdsourced Emotion Annotations

Abstract:Emotion recognition algorithms rely on data annotated with high quality labels. However, emotion expression and perception are inherently subjective. There is generally not a single annotation that can be unambiguously declared "correct". As a result, annotations are colored by the manner in which they were collected. In this paper, we conduct crowdsourcing experiments to investigate this impact on both the annotations themselves and on the performance of these algorithms. We focus on one critical question: the effect of context. We present a new emotion dataset, Multimodal Stressed Emotion (MuSE), and annotate the dataset using two conditions: randomized, in which annotators are presented with clips in random order, and contextualized, in which annotators are presented with clips in order. We find that contextual labeling schemes result in annotations that are more similar to a speaker's own self-reported labels and that labels generated from randomized schemes are most easily predictable by automated systems.

* 5 pages, ICASSP 2019

Via

Access Paper or Ask Questions