Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Elham J. Barezi

Modality-based Factorization for Multimodal Fusion

Nov 30, 2018

Elham J. Barezi, Peyman Momeni, Ian wood, Pascale Fung

Figure 1 for Modality-based Factorization for Multimodal Fusion

Figure 2 for Modality-based Factorization for Multimodal Fusion

Figure 3 for Modality-based Factorization for Multimodal Fusion

Figure 4 for Modality-based Factorization for Multimodal Fusion

Abstract:We propose a multimodal data fusion method by obtaining a $M+1$ dimensional tensor to consider the high-order relationship between $M$ modalities and the output layer of a neural network model. Applying a modality-based tensor factorization method, which adopts different factors for different modalities, results in removing the redundant information with respect to model outputs and leads to fewer model parameters with minimal loss of performance. This factorization method works as a regularizer which leads to a less complicated model and avoids overfitting. In addition, a modality-based factorization approach helps to understand the amount of useful information in each modality. We have applied this method to three different multimodal datasets in sentiment analysis, personality trait recognition, and emotion recognition. The results demonstrate that the approach yields a 1\% to 4\% improvement on several evaluation measures compared to the state-of-the-art for all three tasks.

Via

Access Paper or Ask Questions

Investigating Audio, Visual, and Text Fusion Methods for End-to-End Automatic Personality Prediction

May 16, 2018

Onno Kampman, Elham J. Barezi, Dario Bertero, Pascale Fung

Figure 1 for Investigating Audio, Visual, and Text Fusion Methods for End-to-End Automatic Personality Prediction

Figure 2 for Investigating Audio, Visual, and Text Fusion Methods for End-to-End Automatic Personality Prediction

Figure 3 for Investigating Audio, Visual, and Text Fusion Methods for End-to-End Automatic Personality Prediction

Figure 4 for Investigating Audio, Visual, and Text Fusion Methods for End-to-End Automatic Personality Prediction

Abstract:We propose a tri-modal architecture to predict Big Five personality trait scores from video clips with different channels for audio, text, and video data. For each channel, stacked Convolutional Neural Networks are employed. The channels are fused both on decision-level and by concatenating their respective fully connected layers. It is shown that a multimodal fusion approach outperforms each single modality channel, with an improvement of 9.4\% over the best individual modality (video). Full backpropagation is also shown to be better than a linear combination of modalities, meaning complex interactions between modalities can be leveraged to build better models. Furthermore, we can see the prediction relevance of each modality for each trait. The described model can be used to increase the emotional intelligence of virtual agents.

* Accepted at ACL2018 short paper

Via

Access Paper or Ask Questions