Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Stefanos Zafeiriou

A Survey of Deep Face Restoration: Denoise, Super-Resolution, Deblur, Artifact Removal

Nov 05, 2022

Tao Wang, Kaihao Zhang, Xuanxi Chen, Wenhan Luo, Jiankang Deng, Tong Lu, Xiaochun Cao, Wei Liu, Hongdong Li, Stefanos Zafeiriou

Figure 1 for A Survey of Deep Face Restoration: Denoise, Super-Resolution, Deblur, Artifact Removal

Figure 2 for A Survey of Deep Face Restoration: Denoise, Super-Resolution, Deblur, Artifact Removal

Figure 3 for A Survey of Deep Face Restoration: Denoise, Super-Resolution, Deblur, Artifact Removal

Figure 4 for A Survey of Deep Face Restoration: Denoise, Super-Resolution, Deblur, Artifact Removal

Abstract:Face Restoration (FR) aims to restore High-Quality (HQ) faces from Low-Quality (LQ) input images, which is a domain-specific image restoration problem in the low-level computer vision area. The early face restoration methods mainly use statistic priors and degradation models, which are difficult to meet the requirements of real-world applications in practice. In recent years, face restoration has witnessed great progress after stepping into the deep learning era. However, there are few works to study deep learning-based face restoration methods systematically. Thus, this paper comprehensively surveys recent advances in deep learning techniques for face restoration. Specifically, we first summarize different problem formulations and analyze the characteristic of the face image. Second, we discuss the challenges of face restoration. Concerning these challenges, we present a comprehensive review of existing FR methods, including prior based methods and deep learning-based methods. Then, we explore developed techniques in the task of FR covering network architectures, loss functions, and benchmark datasets. We also conduct a systematic benchmark evaluation on representative methods. Finally, we discuss future directions, including network designs, metrics, benchmark datasets, applications,etc. We also provide an open-source repository for all the discussed methods, which is available at https://github.com/TaoWangzj/Awesome-Face-Restoration.

* 21 pages, 19 figures

Via

Access Paper or Ask Questions

3DMM-RF: Convolutional Radiance Fields for 3D Face Modeling

Sep 15, 2022

Stathis Galanakis, Baris Gecer, Alexandros Lattas, Stefanos Zafeiriou

Figure 1 for 3DMM-RF: Convolutional Radiance Fields for 3D Face Modeling

Figure 2 for 3DMM-RF: Convolutional Radiance Fields for 3D Face Modeling

Figure 3 for 3DMM-RF: Convolutional Radiance Fields for 3D Face Modeling

Figure 4 for 3DMM-RF: Convolutional Radiance Fields for 3D Face Modeling

Abstract:Facial 3D Morphable Models are a main computer vision subject with countless applications and have been highly optimized in the last two decades. The tremendous improvements of deep generative networks have created various possibilities for improving such models and have attracted wide interest. Moreover, the recent advances in neural radiance fields, are revolutionising novel-view synthesis of known scenes. In this work, we present a facial 3D Morphable Model, which exploits both of the above, and can accurately model a subject's identity, pose and expression and render it in arbitrary illumination. This is achieved by utilizing a powerful deep style-based generator to overcome two main weaknesses of neural radiance fields, their rigidity and rendering speed. We introduce a style-based generative network that synthesizes in one pass all and only the required rendering samples of a neural radiance field. We create a vast labelled synthetic dataset of facial renders, and train the network on these data, so that it can accurately model and generalize on facial identity, pose and appearance. Finally, we show that this model can accurately be fit to "in-the-wild" facial images of arbitrary pose and illumination, extract the facial characteristics, and be used to re-render the face in controllable conditions.

Via

Access Paper or Ask Questions

Inverse Image Frequency for Long-tailed Image Recognition

Sep 11, 2022

Konstantinos Panagiotis Alexandridis, Shan Luo, Anh Nguyen, Jiankang Deng, Stefanos Zafeiriou

Figure 1 for Inverse Image Frequency for Long-tailed Image Recognition

Figure 2 for Inverse Image Frequency for Long-tailed Image Recognition

Figure 3 for Inverse Image Frequency for Long-tailed Image Recognition

Figure 4 for Inverse Image Frequency for Long-tailed Image Recognition

Abstract:The long-tailed distribution is a common phenomenon in the real world. Extracted large scale image datasets inevitably demonstrate the long-tailed property and models trained with imbalanced data can obtain high performance for the over-represented categories, but struggle for the under-represented categories, leading to biased predictions and performance degradation. To address this challenge, we propose a novel de-biasing method named Inverse Image Frequency (IIF). IIF is a multiplicative margin adjustment transformation of the logits in the classification layer of a convolutional neural network. Our method achieves stronger performance than similar works and it is especially useful for downstream tasks such as long-tailed instance segmentation as it produces fewer false positive detections. Our extensive experiments show that IIF surpasses the state of the art on many long-tailed benchmarks such as ImageNet-LT, CIFAR-LT, Places-LT and LVIS, reaching 55.8% top-1 accuracy with ResNet50 on ImageNet-LT and 26.2% segmentation AP with MaskRCNN on LVIS. Code available at https://github.com/kostas1515/iif

Via

Access Paper or Ask Questions

Redesigning Multi-Scale Neural Network for Crowd Counting

Aug 04, 2022

Zhipeng Du, Miaojing Shi, Jiankang Deng, Stefanos Zafeiriou

Figure 1 for Redesigning Multi-Scale Neural Network for Crowd Counting

Figure 2 for Redesigning Multi-Scale Neural Network for Crowd Counting

Figure 3 for Redesigning Multi-Scale Neural Network for Crowd Counting

Figure 4 for Redesigning Multi-Scale Neural Network for Crowd Counting

Abstract:Perspective distortions and crowd variations make crowd counting a challenging task in computer vision. To tackle it, many previous works have used multi-scale architecture in deep neural networks (DNNs). Multi-scale branches can be either directly merged (e.g. by concatenation) or merged through the guidance of proxies (e.g. attentions) in the DNNs. Despite their prevalence, these combination methods are not sophisticated enough to deal with the per-pixel performance discrepancy over multi-scale density maps. In this work, we redesign the multi-scale neural network by introducing a hierarchical mixture of density experts, which hierarchically merges multi-scale density maps for crowd counting. Within the hierarchical structure, an expert competition and collaboration scheme is presented to encourage contributions from all scales; pixel-wise soft gating nets are introduced to provide pixel-wise soft weights for scale combinations in different hierarchies. The network is optimized using both the crowd density map and the local counting map, where the latter is obtained by local integration on the former. Optimizing both can be problematic because of their potential conflicts. We introduce a new relative local counting loss based on relative count differences among hard-predicted local regions in an image, which proves to be complementary to the conventional absolute error loss on the density map. Experiments show that our method achieves the state-of-the-art performance on five public datasets, i.e. ShanghaiTech, UCF_CC_50, JHU-CROWD++, NWPU-Crowd and Trancos.

Via

Access Paper or Ask Questions

Free-HeadGAN: Neural Talking Head Synthesis with Explicit Gaze Control

Aug 03, 2022

Michail Christos Doukas, Evangelos Ververas, Viktoriia Sharmanska, Stefanos Zafeiriou

Figure 1 for Free-HeadGAN: Neural Talking Head Synthesis with Explicit Gaze Control

Figure 2 for Free-HeadGAN: Neural Talking Head Synthesis with Explicit Gaze Control

Figure 3 for Free-HeadGAN: Neural Talking Head Synthesis with Explicit Gaze Control

Figure 4 for Free-HeadGAN: Neural Talking Head Synthesis with Explicit Gaze Control

Abstract:We present Free-HeadGAN, a person-generic neural talking head synthesis system. We show that modeling faces with sparse 3D facial landmarks are sufficient for achieving state-of-the-art generative performance, without relying on strong statistical priors of the face, such as 3D Morphable Models. Apart from 3D pose and facial expressions, our method is capable of fully transferring the eye gaze, from a driving actor to a source identity. Our complete pipeline consists of three components: a canonical 3D key-point estimator that regresses 3D pose and expression-related deformations, a gaze estimation network and a generator that is built upon the architecture of HeadGAN. We further experiment with an extension of our generator to accommodate few-shot learning using an attention mechanism, in case more than one source images are available. Compared to the latest models for reenactment and motion transfer, our system achieves higher photo-realism combined with superior identity preservation, while offering explicit gaze control.

Via

Access Paper or Ask Questions

GraphWalks: Efficient Shape Agnostic Geodesic Shortest Path Estimation

May 30, 2022

Rolandos Alexandros Potamias, Alexandros Neofytou, Kyriaki-Margarita Bintsi, Stefanos Zafeiriou

Figure 1 for GraphWalks: Efficient Shape Agnostic Geodesic Shortest Path Estimation

Figure 2 for GraphWalks: Efficient Shape Agnostic Geodesic Shortest Path Estimation

Figure 3 for GraphWalks: Efficient Shape Agnostic Geodesic Shortest Path Estimation

Figure 4 for GraphWalks: Efficient Shape Agnostic Geodesic Shortest Path Estimation

Abstract:Geodesic paths and distances are among the most popular intrinsic properties of 3D surfaces. Traditionally, geodesic paths on discrete polygon surfaces were computed using shortest path algorithms, such as Dijkstra. However, such algorithms have two major limitations. They are non-differentiable which limits their direct usage in learnable pipelines and they are considerably time demanding. To address such limitations and alleviate the computational burden, we propose a learnable network to approximate geodesic paths. The proposed method is comprised by three major components: a graph neural network that encodes node positions in a high dimensional space, a path embedding that describes previously visited nodes and a point classifier that selects the next point in the path. The proposed method provides efficient approximations of the shortest paths and geodesic distances estimations. Given that all of the components of our method are fully differentiable, it can be directly plugged into any learnable pipeline as well as customized under any differentiable constraint. We extensively evaluate the proposed method with several qualitative and quantitative experiments.

* CVPRw 2022

Via

Access Paper or Ask Questions

Decoupled Multi-task Learning with Cyclical Self-Regulation for Face Parsing

Mar 28, 2022

Qingping Zheng, Jiankang Deng, Zheng Zhu, Ying Li, Stefanos Zafeiriou

Figure 1 for Decoupled Multi-task Learning with Cyclical Self-Regulation for Face Parsing

Figure 2 for Decoupled Multi-task Learning with Cyclical Self-Regulation for Face Parsing

Figure 3 for Decoupled Multi-task Learning with Cyclical Self-Regulation for Face Parsing

Figure 4 for Decoupled Multi-task Learning with Cyclical Self-Regulation for Face Parsing

Abstract:This paper probes intrinsic factors behind typical failure cases (e.g. spatial inconsistency and boundary confusion) produced by the existing state-of-the-art method in face parsing. To tackle these problems, we propose a novel Decoupled Multi-task Learning with Cyclical Self-Regulation (DML-CSR) for face parsing. Specifically, DML-CSR designs a multi-task model which comprises face parsing, binary edge, and category edge detection. These tasks only share low-level encoder weights without high-level interactions between each other, enabling to decouple auxiliary modules from the whole network at the inference stage. To address spatial inconsistency, we develop a dynamic dual graph convolutional network to capture global contextual information without using any extra pooling operation. To handle boundary confusion in both single and multiple face scenarios, we exploit binary and category edge detection to jointly obtain generic geometric structure and fine-grained semantic clues of human faces. Besides, to prevent noisy labels from degrading model generalization during training, cyclical self-regulation is proposed to self-ensemble several model instances to get a new model and the resulting model then is used to self-distill subsequent models, through alternating iterations. Experiments show that our method achieves the new state-of-the-art performance on the Helen, CelebAMask-HQ, and Lapa datasets. The source code is available at https://github.com/deepinsight/insightface/tree/master/parsing/dml_csr.

Via

Access Paper or Ask Questions

Facial Geometric Detail Recovery via Implicit Representation

Mar 18, 2022

Xingyu Ren, Alexandros Lattas, Baris Gecer, Jiankang Deng, Chao Ma, Xiaokang Yang, Stefanos Zafeiriou

Figure 1 for Facial Geometric Detail Recovery via Implicit Representation

Figure 2 for Facial Geometric Detail Recovery via Implicit Representation

Figure 3 for Facial Geometric Detail Recovery via Implicit Representation

Figure 4 for Facial Geometric Detail Recovery via Implicit Representation

Abstract:Learning a dense 3D model with fine-scale details from a single facial image is highly challenging and ill-posed. To address this problem, many approaches fit smooth geometries through facial prior while learning details as additional displacement maps or personalized basis. However, these techniques typically require vast datasets of paired multi-view data or 3D scans, whereas such datasets are scarce and expensive. To alleviate heavy data dependency, we present a robust texture-guided geometric detail recovery approach using only a single in-the-wild facial image. More specifically, our method combines high-quality texture completion with the powerful expressiveness of implicit surfaces. Initially, we inpaint occluded facial parts, generate complete textures, and build an accurate multi-view dataset of the same subject. In order to estimate the detailed geometry, we define an implicit signed distance function and employ a physically-based implicit renderer to reconstruct fine geometric details from the generated multi-view images. Our method not only recovers accurate facial details but also decomposes normals, albedos, and shading parts in a self-supervised way. Finally, we register the implicit shape details to a 3D Morphable Model template, which can be used in traditional modeling and rendering pipelines. Extensive experiments demonstrate that the proposed approach can reconstruct impressive facial details from a single image, especially when compared with state-of-the-art methods trained on large datasets.

Via

Access Paper or Ask Questions

Embedding Earth: Self-supervised contrastive pre-training for dense land cover classification

Mar 11, 2022

Michail Tarasiou, Stefanos Zafeiriou

Figure 1 for Embedding Earth: Self-supervised contrastive pre-training for dense land cover classification

Figure 2 for Embedding Earth: Self-supervised contrastive pre-training for dense land cover classification

Figure 3 for Embedding Earth: Self-supervised contrastive pre-training for dense land cover classification

Figure 4 for Embedding Earth: Self-supervised contrastive pre-training for dense land cover classification

Abstract:In training machine learning models for land cover semantic segmentation there is a stark contrast between the availability of satellite imagery to be used as inputs and ground truth data to enable supervised learning. While thousands of new satellite images become freely available on a daily basis, getting ground truth data is still very challenging, time consuming and costly. In this paper we present Embedding Earth a self-supervised contrastive pre-training method for leveraging the large availability of satellite imagery to improve performance on downstream dense land cover classification tasks. Performing an extensive experimental evaluation spanning four countries and two continents we use models pre-trained with our proposed method as initialization points for supervised land cover semantic segmentation and observe significant improvements up to 25% absolute mIoU. In every case tested we outperform random initialization, especially so when ground truth data are scarse. Through a series of ablation studies we explore the qualities of the proposed approach and find that learnt features can generalize between disparate regions opening up the possibility of using the proposed pre-training scheme as a replacement to random initialization for Earth observation tasks. Code will be uploaded soon at https://github.com/michaeltrs/DeepSatModels.

* Self-supervised pre-training for semantic segmentation. Replacement to random initialization

Via

Access Paper or Ask Questions

2021 BEETL Competition: Advancing Transfer Learning for Subject Independence & Heterogenous EEG Data Sets

Feb 14, 2022

Xiaoxi Wei, A. Aldo Faisal, Moritz Grosse-Wentrup, Alexandre Gramfort, Sylvain Chevallier, Vinay Jayaram, Camille Jeunet, Stylianos Bakas, Siegfried Ludwig, Konstantinos Barmpas(+11 more)

Figure 1 for 2021 BEETL Competition: Advancing Transfer Learning for Subject Independence & Heterogenous EEG Data Sets

Figure 2 for 2021 BEETL Competition: Advancing Transfer Learning for Subject Independence & Heterogenous EEG Data Sets

Figure 3 for 2021 BEETL Competition: Advancing Transfer Learning for Subject Independence & Heterogenous EEG Data Sets

Figure 4 for 2021 BEETL Competition: Advancing Transfer Learning for Subject Independence & Heterogenous EEG Data Sets

Abstract:Transfer learning and meta-learning offer some of the most promising avenues to unlock the scalability of healthcare and consumer technologies driven by biosignal data. This is because current methods cannot generalise well across human subjects' data and handle learning from different heterogeneously collected data sets, thus limiting the scale of training data. On the other side, developments in transfer learning would benefit significantly from a real-world benchmark with immediate practical application. Therefore, we pick electroencephalography (EEG) as an exemplar for what makes biosignal machine learning hard. We design two transfer learning challenges around diagnostics and Brain-Computer-Interfacing (BCI), that have to be solved in the face of low signal-to-noise ratios, major variability among subjects, differences in the data recording sessions and techniques, and even between the specific BCI tasks recorded in the dataset. Task 1 is centred on the field of medical diagnostics, addressing automatic sleep stage annotation across subjects. Task 2 is centred on Brain-Computer Interfacing (BCI), addressing motor imagery decoding across both subjects and data sets. The BEETL competition with its over 30 competing teams and its 3 winning entries brought attention to the potential of deep transfer learning and combinations of set theory and conventional machine learning techniques to overcome the challenges. The results set a new state-of-the-art for the real-world BEETL benchmark.

* PrePrint of the NeurIPS2021 BEETL Competition Submitted to Proceedings of Machine Learning Research (PMLR)

Via

Access Paper or Ask Questions