Automatic detection of facial Action Units (AUs) allows for objective facial expression analysis. Due to the high cost of AU labeling and the limited size of existing benchmarks, previous AU detection methods tend to overfit the dataset, resulting in a significant performance loss when evaluated across corpora. To address this problem, we propose FG-Net for generalizable facial action unit detection. Specifically, FG-Net extracts feature maps from a StyleGAN2 model pre-trained on a large and diverse face image dataset. Then, these features are used to detect AUs with a Pyramid CNN Interpreter, making the training efficient and capturing essential local features. The proposed FG-Net achieves a strong generalization ability for heatmap-based AU detection thanks to the generalizable and semantic-rich features extracted from the pre-trained generative model. Extensive experiments are conducted to evaluate within- and cross-corpus AU detection with the widely-used DISFA and BP4D datasets. Compared with the state-of-the-art, the proposed method achieves superior cross-domain performance while maintaining competitive within-domain performance. In addition, FG-Net is data-efficient and achieves competitive performance even when trained on 1000 samples. Our code will be released at \url{https://github.com/ihp-lab/FG-Net}
Facial expression analysis is an important tool for human-computer interaction. In this paper, we introduce LibreFace, an open-source toolkit for facial expression analysis. This open-source toolbox offers real-time and offline analysis of facial behavior through deep learning models, including facial action unit (AU) detection, AU intensity estimation, and facial expression recognition. To accomplish this, we employ several techniques, including the utilization of a large-scale pre-trained network, feature-wise knowledge distillation, and task-specific fine-tuning. These approaches are designed to effectively and accurately analyze facial expressions by leveraging visual information, thereby facilitating the implementation of real-time interactive applications. In terms of Action Unit (AU) intensity estimation, we achieve a Pearson Correlation Coefficient (PCC) of 0.63 on DISFA, which is 7% higher than the performance of OpenFace 2.0 while maintaining highly-efficient inference that runs two times faster than OpenFace 2.0. Despite being compact, our model also demonstrates competitive performance to state-of-the-art facial expression analysis methods on AffecNet, FFHQ, and RAFDB. Our code will be released at https://github.com/ihp-lab/LibreFace
This paper proposes schemes to improve the spectral efficiency of a multiple-input multiple-output (MIMO) broadcast channel (BC) with I/Q imbalance (IQI) at transceivers by employing a combination of improper Gaussian signaling (IGS), non-orthogonal multiple access (NOMA) and simultaneously transmit and reflect (STAR) reconfigurable intelligent surface (RIS). When there exists IQI, the output RF signal is a widely linear transformation of the input signal, which may make the output signal improper. To compensate for IQI, we employ IGS, thus generating a transmit improper signal. We show that IGS alongside with NOMA can highly increase the minimum rate of the users. Moreover, we propose schemes for different operational modes of STAR-RIS and show that STAR-RIS can significantly improve the system performance. Additionally, we show that IQI can highly degrade the performance especially if it is overlooked in the design.
This paper proposes a general optimization framework for rate splitting multiple access (RSMA) in beyond diagonal (BD) reconfigurable intelligent surface (RIS) assisted ultra-reliable low-latency communications (URLLC) systems. This framework can solve a large family of optimization problems in which the objective and/or constraints are linear functions of the rates and/or energy efficiency (EE) of users. Using this framework, we show that RSMA and RIS can be mutually beneficial tools when the system is overloaded, i.e., when the number of users per cell is higher than the number of base station (BS) antennas. Additionally, we show that the benefits of RSMA increase when the packets are shorter and/or the reliability constraint is more stringent. Furthermore, we show that the RSMA benefits increase with the number of users per cell and decrease with the number of BS antennas. Finally, we show that RIS (either diagonal or BD) can highly improve the system performance, and BD-RIS outperforms regular RIS.
This paper proposes an energy-efficient scheme for multicell multiple-input, multiple-output (MIMO) simultaneous transmit and reflect (STAR) reconfigurable intelligent surfaces (RIS)-assisted broadcast channels by employing rate splitting (RS) and improper Gaussian signaling (IGS). Regular RISs can only reflect signals. Thus, a regular RIS can assist only when the transmitter and receiver are in the reflection space of the RIS. However, a STAR-RIS can simultaneously transmit and reflect, thus providing a 360-degrees coverage. In this paper, we assume that transceivers may suffer from I/Q imbalance (IQI). To compensate for IQI, we employ IGS. Moreover, we employ RS to manage intracell interference. We show that RIS can significantly improve the energy efficiency (EE) of the system when RIS components are carefully optimized. Additionally, we show that STAR-RIS can significantly outperform a regular RIS when the regular RIS cannot cover all the users. We also show that RS can highly increase the EE comparing to treating interference as noise.
Facial action unit detection has emerged as an important task within facial expression analysis, aimed at detecting specific pre-defined, objective facial expressions, such as lip tightening and cheek raising. This paper presents our submission to the Affective Behavior Analysis in-the-wild (ABAW) 2023 Competition for AU detection. We propose a multi-modal method for facial action unit detection with visual, acoustic, and lexical features extracted from the large pre-trained models. To provide high-quality details for visual feature extraction, we apply super-resolution and face alignment to the training data and show potential performance gain. Our approach achieves the F1 score of 52.3% on the official validation set of the 5th ABAW Challenge.
We address the problem of interference leakage (IL) minimization in the $K$-user multiple-input multiple-output (MIMO) interference channel (IC) assisted by a reconfigurable intelligent surface (RIS). We describe an iterative algorithm based on block coordinate descent to minimize the IL cost function. A reformulation of the problem provides a geometric interpretation and shows interesting connections with envelope precoding and phase-only zero-forcing beamforming problems. As a result of this analysis, we derive a set of necessary (but not sufficient) conditions for a phase-optimized RIS to be able to perfectly cancel the interference on the $K$-user MIMO IC.
In this paper, we study the achievable rate region of 1-layer rate splitting (RS) in the presence of hardware impairment (HWI) and improper Gaussian signaling (IGS) for a single-cell reconfigurable intelligent surface (RIS) assisted broadcast channel (BC). We assume that the transceivers may suffer from an imbalance in in-band and quadrature signals, which is known as I/Q imbalance (IQI). The received signal and noise can be improper when there exists IQI. Therefore, we employ IGS to compensate for IQI as well as to manage interference. Our results show that RS and RIS can significantly enlarge the rate region, where the role of RS is to manage interference while RIS mainly improves the coverage.
This paper proposes a general optimization framework to improve the spectral and energy efficiency (EE) of ultra-reliable low-latency communication (URLLC) reconfigurable intelligent surface (RIS)-assisted interference-limited systems with finite block length (FBL). This framework can be applied to any interference-limited system with treating interference as noise as the decoding strategy at receivers. Additionally, the framework can solve a large variety of optimization problems in which the objective and/or constraints are linear functions of the rates and/or EE of users. We consider a multi-cell broadcast channel as an example and show how this framework can be specialized to solve the minimum-weighted rate, weighted sum rate, global EE and weighted EE of the system. In addition to regular RIS, we consider simultaneous-transfer-and-receive (STAR)-RIS in which each passive RIS component can simultaneously reflect and transmit signals. We make realistic assumptions regarding the (STAR-)RIS by considering three different feasibility sets for the components of either regular RIS or STAR-RIS. We show that RIS can substantially increase the spectral and EE of RIS-assisted URLLC systems if the reflecting coefficients are properly optimized. Moreover, we show that STAR-RIS can outperform a regular RIS when the regular RIS cannot cover all the users.