The Affective Behavior Analysis in-the-wild (ABAW) 2022 Competition gives Affective Computing a large promotion. In this paper, we present our method of AU challenge in this Competition. We use improved IResnet100 as backbone. Then we train AU dataset in Aff-Wild2 on three pertained models pretrained by our private au and expression dataset, and Glint360K respectively. Finally, we ensemble the results of our models. We achieved F1 score (macro) 0.731 on AU validation set.
Since the Transformer architecture was introduced in 2017 there has been many attempts to bring the self-attention paradigm in the field of computer vision. In this paper we propose a novel self-attention module that can be easily integrated in virtually every convolutional neural network and that is specifically designed for computer vision, the LHC: Local (multi) Head Channel (self-attention). LHC is based on two main ideas: first, we think that in computer vision the best way to leverage the self-attention paradigm is the channel-wise application instead of the more explored spatial attention and that convolution will not be replaced by attention modules like recurrent networks were in NLP; second, a local approach has the potential to better overcome the limitations of convolution than global attention. With LHC-Net we managed to achieve a new state of the art in the famous FER2013 dataset with a significantly lower complexity and impact on the "host" architecture in terms of computational cost when compared with the previous SOTA.
Current state-of-the-art models for automatic FER are based on very deep neural networks that are difficult to train. This makes it challenging to adapt these models to changing conditions, a requirement from FER models given the subjective nature of affect perception and understanding. In this paper, we address this problem by formalizing the FaceChannel, a light-weight neural network that has much fewer parameters than common deep neural networks. We perform a series of experiments on different benchmark datasets to demonstrate how the FaceChannel achieves a comparable, if not better, performance, as compared to the current state-of-the-art in FER.
The Local Binary Patterns (LBP) is a local descriptor proposed by Ojala et al to discriminate texture due to its discriminative power. However, the LBP is sensitive to noise and illumination changes. Consequently, several extensions to the LBP such as Median Binary Pattern (MBP) and methods such as Local Directional Pattern (LDP) have been proposed to address its drawbacks. Though studies by Zhou et al, suggest that the LDP exhibits poor performance in presence of random noise. Recently, convolution neural networks (ConvNets) were introduced which are increasingly becoming popular for feature extraction due to their discriminative power. This study aimed at evaluating the sensitivity of ResNet50, a ConvNet pre-trained model and local descriptors (LBP and LDP) to noise using the Extended Yale B face dataset with 5 different levels of noise added to the dataset. In our findings, it was observed that despite adding different levels of noise to the dataset, ResNet50 proved to be more robust than the local descriptors (LBP and LDP).
This paper presents a neural network based method Multi-Task Affect Net(MTANet) submitted to the Affective Behavior Analysis in-the-Wild Challenge in FG2020. This method is a multi-task network and based on SE-ResNet modules. By utilizing multi-task learning, this network can estimate and recognize three quantified affective models: valence and arousal, action units, and seven basic emotions simultaneously. MTANet achieve Concordance Correlation Coefficient(CCC) rates of 0.28 and 0.34 for valence and arousal, F1-score of 0.427 and 0.32 for AUs detection and categorical emotion classification.