Alert button
Picture for Fotios Lygerakis

Fotios Lygerakis

Alert button

CR-VAE: Contrastive Regularization on Variational Autoencoders for Preventing Posterior Collapse

Sep 09, 2023
Fotios Lygerakis, Elmar Rueckert

Figure 1 for CR-VAE: Contrastive Regularization on Variational Autoencoders for Preventing Posterior Collapse
Figure 2 for CR-VAE: Contrastive Regularization on Variational Autoencoders for Preventing Posterior Collapse
Figure 3 for CR-VAE: Contrastive Regularization on Variational Autoencoders for Preventing Posterior Collapse
Figure 4 for CR-VAE: Contrastive Regularization on Variational Autoencoders for Preventing Posterior Collapse

The Variational Autoencoder (VAE) is known to suffer from the phenomenon of \textit{posterior collapse}, where the latent representations generated by the model become independent of the inputs. This leads to degenerated representations of the input, which is attributed to the limitations of the VAE's objective function. In this work, we propose a novel solution to this issue, the Contrastive Regularization for Variational Autoencoders (CR-VAE). The core of our approach is to augment the original VAE with a contrastive objective that maximizes the mutual information between the representations of similar visual inputs. This strategy ensures that the information flow between the input and its latent representation is maximized, effectively avoiding posterior collapse. We evaluate our method on a series of visual datasets and demonstrate, that CR-VAE outperforms state-of-the-art approaches in preventing posterior collapse.

Viaarxiv icon

Sequential Late Fusion Technique for Multi-modal Sentiment Analysis

Jun 22, 2021
Debapriya Banerjee, Fotios Lygerakis, Fillia Makedon

Figure 1 for Sequential Late Fusion Technique for Multi-modal Sentiment Analysis
Figure 2 for Sequential Late Fusion Technique for Multi-modal Sentiment Analysis

Multi-modal sentiment analysis plays an important role for providing better interactive experiences to users. Each modality in multi-modal data can provide different viewpoints or reveal unique aspects of a user's emotional state. In this work, we use text, audio and visual modalities from MOSI dataset and we propose a novel fusion technique using a multi-head attention LSTM network. Finally, we perform a classification task and evaluate its performance.

* 2 pages, 1 figure, 1 table 
Viaarxiv icon