Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Cheul Young Park

Separating Content from Speaker Identity in Speech for the Assessment of Cognitive Impairments

Mar 21, 2022

Dongseok Heo, Cheul Young Park, Jaemin Cheun, Myung Jin Ko

Figure 1 for Separating Content from Speaker Identity in Speech for the Assessment of Cognitive Impairments

Figure 2 for Separating Content from Speaker Identity in Speech for the Assessment of Cognitive Impairments

Figure 3 for Separating Content from Speaker Identity in Speech for the Assessment of Cognitive Impairments

Figure 4 for Separating Content from Speaker Identity in Speech for the Assessment of Cognitive Impairments

Abstract:Deep speaker embeddings have been shown effective for assessing cognitive impairments aside from their original purpose of speaker verification. However, the research found that speaker embeddings encode speaker identity and an array of information, including speaker demographics, such as sex and age, and speech contents to an extent, which are known confounders in the assessment of cognitive impairments. In this paper, we hypothesize that content information separated from speaker identity using a framework for voice conversion is more effective for assessing cognitive impairments and train simple classifiers for the comparative analysis on the DementiaBank Pitt Corpus. Our results show that while content embeddings have an advantage over speaker embeddings for the defined problem, further experiments show their effectiveness depends on information encoded in speaker embeddings due to the inherent design of the architecture used for extracting contents.

* 5 pages, submitted to INTERSPEECH 2022

Via

Access Paper or Ask Questions

K-EmoCon, a multimodal sensor dataset for continuous emotion recognition in naturalistic conversations

May 19, 2020

Cheul Young Park, Narae Cha, Soowon Kang, Auk Kim, Ahsan Habib Khandoker, Leontios Hadjileontiadis, Alice Oh, Yong Jeong, Uichin Lee

Figure 1 for K-EmoCon, a multimodal sensor dataset for continuous emotion recognition in naturalistic conversations

Figure 2 for K-EmoCon, a multimodal sensor dataset for continuous emotion recognition in naturalistic conversations

Figure 3 for K-EmoCon, a multimodal sensor dataset for continuous emotion recognition in naturalistic conversations

Figure 4 for K-EmoCon, a multimodal sensor dataset for continuous emotion recognition in naturalistic conversations

Abstract:Recognizing emotions during social interactions has many potential applications with the popularization of low-cost mobile sensors, but a challenge remains with the lack of naturalistic affective interaction data. Most existing emotion datasets do not support studying idiosyncratic emotions arising in the wild as they were collected in constrained environments. Therefore, studying emotions in the context of social interactions requires a novel dataset, and K-EmoCon is such a multimodal dataset with comprehensive annotations of continuous emotions during naturalistic conversations. The dataset contains multimodal measurements, including audiovisual recordings, EEG, and peripheral physiological signals, acquired with off-the-shelf devices from 16 sessions of approximately 10-minute long paired debates on a social issue. Distinct from previous datasets, it includes emotion annotations from all three available perspectives: self, debate partner, and external observers. Raters annotated emotional displays at intervals of every 5 seconds while viewing the debate footage, in terms of arousal-valence and 18 additional categorical emotions. The resulting K-EmoCon is the first publicly available emotion dataset accommodating the multiperspective assessment of emotions during social interactions.

* 20 pages, 4 figures, for associated dataset, see https://doi.org/10.5281/zenodo.3814370

Via

Access Paper or Ask Questions