Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Shahin Amiriparian

EIHW -- Chair of Embedded Intelligence for Health Care and Wellbeing, University of Augsburg, Germany

The ACM Multimedia 2023 Computational Paralinguistics Challenge: Emotion Share & Requests

May 01, 2023

Björn W. Schuller, Anton Batliner, Shahin Amiriparian, Alexander Barnhill, Maurice Gerczuk, Andreas Triantafyllopoulos, Alice Baird, Panagiotis Tzirakis, Chris Gagne, Alan S. Cowen(+3 more)

Figure 1 for The ACM Multimedia 2023 Computational Paralinguistics Challenge: Emotion Share & Requests

Figure 2 for The ACM Multimedia 2023 Computational Paralinguistics Challenge: Emotion Share & Requests

Figure 3 for The ACM Multimedia 2023 Computational Paralinguistics Challenge: Emotion Share & Requests

Figure 4 for The ACM Multimedia 2023 Computational Paralinguistics Challenge: Emotion Share & Requests

Abstract:The ACM Multimedia 2023 Computational Paralinguistics Challenge addresses two different problems for the first time in a research competition under well-defined conditions: In the Emotion Share Sub-Challenge, a regression on speech has to be made; and in the Requests Sub-Challenges, requests and complaints need to be detected. We describe the Sub-Challenges, baseline feature extraction, and classifiers based on the usual ComPaRE features, the auDeep toolkit, and deep feature extraction from pre-trained CNNs using the DeepSpectRum toolkit; in addition, wav2vec2 models are used.

* 5 pages, part of the ACM Multimedia 2023 Grand Challenge "The ACM Multimedia 2023 Computational Paralinguistics Challenge (ComParE 2023). arXiv admin note: text overlap with arXiv:2205.06799

Via

Access Paper or Ask Questions

HEAR4Health: A blueprint for making computer audition a staple of modern healthcare

Jan 25, 2023

Andreas Triantafyllopoulos, Alexander Kathan, Alice Baird, Lukas Christ, Alexander Gebhard, Maurice Gerczuk, Vincent Karas, Tobias Hübner, Xin Jing, Shuo Liu(+13 more)

Figure 1 for HEAR4Health: A blueprint for making computer audition a staple of modern healthcare

Figure 2 for HEAR4Health: A blueprint for making computer audition a staple of modern healthcare

Abstract:Recent years have seen a rapid increase in digital medicine research in an attempt to transform traditional healthcare systems to their modern, intelligent, and versatile equivalents that are adequately equipped to tackle contemporary challenges. This has led to a wave of applications that utilise AI technologies; first and foremost in the fields of medical imaging, but also in the use of wearables and other intelligent sensors. In comparison, computer audition can be seen to be lagging behind, at least in terms of commercial interest. Yet, audition has long been a staple assistant for medical practitioners, with the stethoscope being the quintessential sign of doctors around the world. Transforming this traditional technology with the use of AI entails a set of unique challenges. We categorise the advances needed in four key pillars: Hear, corresponding to the cornerstone technologies needed to analyse auditory signals in real-life conditions; Earlier, for the advances needed in computational and data efficiency; Attentively, for accounting to individual differences and handling the longitudinal nature of medical data; and, finally, Responsibly, for ensuring compliance to the ethical standards accorded to the field of medicine.

Via

Access Paper or Ask Questions

Computational Charisma -- A Brick by Brick Blueprint for Building Charismatic Artificial Intelligence

Dec 31, 2022

Björn W. Schuller, Shahin Amiriparian, Anton Batliner, Alexander Gebhard, Maurice Gerzcuk, Vincent Karas, Alexander Kathan, Lennart Seizer, Johanna Löchner

Figure 1 for Computational Charisma -- A Brick by Brick Blueprint for Building Charismatic Artificial Intelligence

Figure 2 for Computational Charisma -- A Brick by Brick Blueprint for Building Charismatic Artificial Intelligence

Figure 3 for Computational Charisma -- A Brick by Brick Blueprint for Building Charismatic Artificial Intelligence

Abstract:Charisma is considered as one's ability to attract and potentially also influence others. Clearly, there can be considerable interest from an artificial intelligence's (AI) perspective to provide it with such skill. Beyond, a plethora of use cases opens up for computational measurement of human charisma, such as for tutoring humans in the acquisition of charisma, mediating human-to-human conversation, or identifying charismatic individuals in big social data. A number of models exist that base charisma on various dimensions, often following the idea that charisma is given if someone could and would help others. Examples include influence (could help) and affability (would help) in scientific studies or power (could help), presence, and warmth (both would help) as a popular concept. Modelling high levels in these dimensions for humanoid robots or virtual agents, seems accomplishable. Beyond, also automatic measurement appears quite feasible with the recent advances in the related fields of Affective Computing and Social Signal Processing. Here, we, thereforem present a blueprint for building machines that can appear charismatic, but also analyse the charisma of others. To this end, we first provide the psychological perspective including different models of charisma and behavioural cues of it. We then switch to conversational charisma in spoken language as an exemplary modality that is essential for human-human and human-computer conversations. The computational perspective then deals with the recognition and generation of charismatic behaviour by AI. This includes an overview of the state of play in the field and the aforementioned blueprint. We then name exemplary use cases of computational charismatic skills before switching to ethical aspects and concluding this overview and perspective on building charisma-enabled AI.

Via

Access Paper or Ask Questions

Automatic Emotion Modelling in Written Stories

Dec 21, 2022

Lukas Christ, Shahin Amiriparian, Manuel Milling, Ilhan Aslan, Björn W. Schuller

Abstract:Telling stories is an integral part of human communication which can evoke emotions and influence the affective states of the audience. Automatically modelling emotional trajectories in stories has thus attracted considerable scholarly interest. However, as most existing works have been limited to unsupervised dictionary-based approaches, there is no labelled benchmark for this task. We address this gap by introducing continuous valence and arousal annotations for an existing dataset of children's stories annotated with discrete emotion categories. We collect additional annotations for this data and map the originally categorical labels to the valence and arousal space. Leveraging recent advances in Natural Language Processing, we propose a set of novel Transformer-based methods for predicting valence and arousal signals over the course of written stories. We explore several strategies for fine-tuning a pretrained ELECTRA model and study the benefits of considering a sentence's context when inferring its emotionality. Moreover, we experiment with additional LSTM and Transformer layers. The best configuration achieves a Concordance Correlation Coefficient (CCC) of .7338 for valence and .6302 for arousal on the test set, demonstrating the suitability of our proposed approach. Our code and additional annotations are made available at https://github.com/lc0197/emotion_modelling_stories.

* This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

Via

Access Paper or Ask Questions

METEOR Guided Divergence for Video Captioning

Dec 20, 2022

Daniel Lukas Rothenpieler, Shahin Amiriparian

Abstract:Automatic video captioning aims for a holistic visual scene understanding. It requires a mechanism for capturing temporal context in video frames and the ability to comprehend the actions and associations of objects in a given timeframe. Such a system should additionally learn to abstract video sequences into sensible representations as well as to generate natural written language. While the majority of captioning models focus solely on the visual inputs, little attention has been paid to the audiovisual modality. To tackle this issue, we propose a novel two-fold approach. First, we implement a reward-guided KL Divergence to train a video captioning model which is resilient towards token permutations. Second, we utilise a Bi-Modal Hierarchical Reinforcement Learning (BMHRL) Transformer architecture to capture long-term temporal dependencies of the input data as a foundation for our hierarchical captioning module. Using our BMHRL, we show the suitability of the HRL agent in the generation of content-complete and grammatically sound sentences by achieving $4.91$, $2.23$, and $10.80$ in BLEU3, BLEU4, and METEOR scores, respectively on the ActivityNet Captions dataset. Finally, we make our BMHRL framework and trained models publicly available for users and developers at https://github.com/d-rothen/bmhrl.

Via

Access Paper or Ask Questions

Multimodal Prediction of Spontaneous Humour: A Novel Dataset and First Results

Sep 28, 2022

Lukas Christ, Shahin Amiriparian, Alexander Kathan, Niklas Müller, Andreas König, Björn W. Schuller

Figure 1 for Multimodal Prediction of Spontaneous Humour: A Novel Dataset and First Results

Figure 2 for Multimodal Prediction of Spontaneous Humour: A Novel Dataset and First Results

Figure 3 for Multimodal Prediction of Spontaneous Humour: A Novel Dataset and First Results

Figure 4 for Multimodal Prediction of Spontaneous Humour: A Novel Dataset and First Results

Abstract:Humour is a substantial element of human affect and cognition. Its automatic understanding can facilitate a more naturalistic human-device interaction and the humanisation of artificial intelligence. Current methods of humour detection are solely based on staged data making them inadequate for 'real-world' applications. We address this deficiency by introducing the novel Passau-Spontaneous Football Coach Humour (Passau-SFCH) dataset, comprising of about 11 hours of recordings. The Passau-SFCH dataset is annotated for the presence of humour and its dimensions (sentiment and direction) as proposed in Martin's Humor Style Questionnaire. We conduct a series of experiments, employing pretrained Transformers, convolutional neural networks, and expert-designed features. The performance of each modality (text, audio, video) for spontaneous humour recognition is analysed and their complementarity is investigated. Our findings suggest that for the automatic analysis of humour and its sentiment, facial expressions are most promising, while humour direction can be best modelled via text-based features. The results reveal considerable differences among various subjects, highlighting the individuality of humour usage and style. Further, we observe that a decision-level fusion yields the best recognition result. Finally, we make our code publicly available at https://www.github.com/EIHW/passau-sfch. The Passau-SFCH dataset is available upon request.

* This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

Via

Access Paper or Ask Questions

Distinguishing between pre- and post-treatment in the speech of patients with chronic obstructive pulmonary disease

Jul 26, 2022

Andreas Triantafyllopoulos, Markus Fendler, Anton Batliner, Maurice Gerczuk, Shahin Amiriparian, Thomas M. Berghaus, Björn W. Schuller

Figure 1 for Distinguishing between pre- and post-treatment in the speech of patients with chronic obstructive pulmonary disease

Figure 2 for Distinguishing between pre- and post-treatment in the speech of patients with chronic obstructive pulmonary disease

Abstract:Chronic obstructive pulmonary disease (COPD) causes lung inflammation and airflow blockage leading to a variety of respiratory symptoms; it is also a leading cause of death and affects millions of individuals around the world. Patients often require treatment and hospitalisation, while no cure is currently available. As COPD predominantly affects the respiratory system, speech and non-linguistic vocalisations present a major avenue for measuring the effect of treatment. In this work, we present results on a new COPD dataset of 20 patients, showing that, by employing personalisation through speaker-level feature normalisation, we can distinguish between pre- and post-treatment speech with an unweighted average recall (UAR) of up to 82\,\% in (nested) leave-one-speaker-out cross-validation. We further identify the most important features and link them to pathological voice properties, thus enabling an auditory interpretation of treatment effects. Monitoring tools based on such approaches may help objectivise the clinical status of COPD patients and facilitate personalised treatment plans.

* Accepted in INTERSPEECH 2022

Via

Access Paper or Ask Questions

COLD Fusion: Calibrated and Ordinal Latent Distribution Fusion for Uncertainty-Aware Multimodal Emotion Recognition

Jun 12, 2022

Mani Kumar Tellamekala, Shahin Amiriparian, Björn W. Schuller, Elisabeth André, Timo Giesbrecht, Michel Valstar

Figure 1 for COLD Fusion: Calibrated and Ordinal Latent Distribution Fusion for Uncertainty-Aware Multimodal Emotion Recognition

Figure 2 for COLD Fusion: Calibrated and Ordinal Latent Distribution Fusion for Uncertainty-Aware Multimodal Emotion Recognition

Figure 3 for COLD Fusion: Calibrated and Ordinal Latent Distribution Fusion for Uncertainty-Aware Multimodal Emotion Recognition

Figure 4 for COLD Fusion: Calibrated and Ordinal Latent Distribution Fusion for Uncertainty-Aware Multimodal Emotion Recognition

Abstract:Automatically recognising apparent emotions from face and voice is hard, in part because of various sources of uncertainty, including in the input data and the labels used in a machine learning framework. This paper introduces an uncertainty-aware audiovisual fusion approach that quantifies modality-wise uncertainty towards emotion prediction. To this end, we propose a novel fusion framework in which we first learn latent distributions over audiovisual temporal context vectors separately, and then constrain the variance vectors of unimodal latent distributions so that they represent the amount of information each modality provides w.r.t. emotion recognition. In particular, we impose Calibration and Ordinal Ranking constraints on the variance vectors of audiovisual latent distributions. When well-calibrated, modality-wise uncertainty scores indicate how much their corresponding predictions may differ from the ground truth labels. Well-ranked uncertainty scores allow the ordinal ranking of different frames across the modalities. To jointly impose both these constraints, we propose a softmax distributional matching loss. In both classification and regression settings, we compare our uncertainty-aware fusion model with standard model-agnostic fusion baselines. Our evaluation on two emotion recognition corpora, AVEC 2019 CES and IEMOCAP, shows that audiovisual emotion recognition can considerably benefit from well-calibrated and well-ranked latent uncertainty measures.

Via

Access Paper or Ask Questions

The ACM Multimedia 2022 Computational Paralinguistics Challenge: Vocalisations, Stuttering, Activity, & Mosquitoes

May 13, 2022

Björn W. Schuller, Anton Batliner, Shahin Amiriparian, Christian Bergler, Maurice Gerczuk, Natalie Holz, Pauline Larrouy-Maestri, Sebastian P. Bayerl, Korbinian Riedhammer, Adria Mallol-Ragolta(+5 more)

Figure 1 for The ACM Multimedia 2022 Computational Paralinguistics Challenge: Vocalisations, Stuttering, Activity, & Mosquitoes

Figure 2 for The ACM Multimedia 2022 Computational Paralinguistics Challenge: Vocalisations, Stuttering, Activity, & Mosquitoes

Figure 3 for The ACM Multimedia 2022 Computational Paralinguistics Challenge: Vocalisations, Stuttering, Activity, & Mosquitoes

Abstract:The ACM Multimedia 2022 Computational Paralinguistics Challenge addresses four different problems for the first time in a research competition under well-defined conditions: In the Vocalisations and Stuttering Sub-Challenges, a classification on human non-verbal vocalisations and speech has to be made; the Activity Sub-Challenge aims at beyond-audio human activity recognition from smartwatch sensor data; and in the Mosquitoes Sub-Challenge, mosquitoes need to be detected. We describe the Sub-Challenges, baseline feature extraction, and classifiers based on the usual ComPaRE and BoAW features, the auDeep toolkit, and deep feature extraction from pre-trained CNNs using the DeepSpectRum toolkit; in addition, we add end-to-end sequential modelling, and a log-mel-128-BNN.

* 5 pages, part of the ACM Multimedia 2022 Grand Challenge "The ACM Multimedia 2022 Computational Paralinguistics Challenge (ComParE 2022)"

Via

Access Paper or Ask Questions

Fatigue Prediction in Outdoor Running Conditions using Audio Data

May 09, 2022

Andreas Triantafyllopoulos, Sandra Ottl, Alexander Gebhard, Esther Rituerto-González, Mirko Jaumann, Steffen Hüttner, Valerie Dieter, Patrick Schneeweiß, Inga Krauß, Maurice Gerczuk(+2 more)

Figure 1 for Fatigue Prediction in Outdoor Running Conditions using Audio Data

Figure 2 for Fatigue Prediction in Outdoor Running Conditions using Audio Data

Figure 3 for Fatigue Prediction in Outdoor Running Conditions using Audio Data

Figure 4 for Fatigue Prediction in Outdoor Running Conditions using Audio Data

Abstract:Although running is a common leisure activity and a core training regiment for several athletes, between $29\%$ and $79\%$ of runners sustain an overuse injury each year. These injuries are linked to excessive fatigue, which alters how someone runs. In this work, we explore the feasibility of modelling the Borg received perception of exertion (RPE) scale (range: $[6-20]$), a well-validated subjective measure of fatigue, using audio data captured in realistic outdoor environments via smartphones attached to the runners' arms. Using convolutional neural networks (CNNs) on log-Mel spectrograms, we obtain a mean absolute error of $2.35$ in subject-dependent experiments, demonstrating that audio can be effectively used to model fatigue, while being more easily and non-invasively acquired than by signals from other sensors.

* Paper accepted at IEEE EMBC 2022. Rights remain with IEEE

Via

Access Paper or Ask Questions