Abstract:To enhance the generalization performance of Multi-Task Networks (MTN) in Face Attribute Recognition (FAR), it is crucial to share relevant information across multiple related prediction tasks effectively. Traditional MTN methods create shared low-level modules and distinct high-level modules, causing an exponential increase in model parameters with the addition of tasks. This approach also limits feature interaction at the high level, hindering the exploration of semantic relations among attributes, thereby affecting generalization negatively. In response, this study introduces FAR-AMTN, a novel Attention Multi-Task Network for FAR. It incorporates a Weight-Shared Group-Specific Attention (WSGSA) module with shared parameters to minimize complexity while improving group feature representation. Furthermore, a Cross-Group Feature Fusion (CGFF) module is utilized to foster interactions between attribute groups, enhancing feature learning. A Dynamic Weighting Strategy (DWS) is also introduced for synchronized task convergence. Experiments on the CelebA and LFWA datasets demonstrate that the proposed FAR-AMTN demonstrates superior accuracy with significantly fewer parameters compared to existing models.
Abstract:Face Attribute Recognition (FAR) plays a crucial role in applications such as person re-identification, face retrieval, and face editing. Conventional multi-task attribute recognition methods often process the entire feature map for feature extraction and attribute classification, which can produce redundant features due to reliance on global regions. To address these challenges, we propose a novel approach emphasizing the selection of specific feature regions for efficient feature learning. We introduce the Mask-Guided Multi-Task Network (MGMTN), which integrates Adaptive Mask Learning (AML) and Group-Global Feature Fusion (G2FF) to address the aforementioned limitations. Leveraging a pre-trained keypoint annotation model and a fully convolutional network, AML accurately localizes critical facial parts (e.g., eye and mouth groups) and generates group masks that delineate meaningful feature regions, thereby mitigating negative transfer from global region usage. Furthermore, G2FF combines group and global features to enhance FAR learning, enabling more precise attribute identification. Extensive experiments on two challenging facial attribute recognition datasets demonstrate the effectiveness of MGMTN in improving FAR performance.