Anomaly detection-based spoof attack detection is a recent development in face Presentation Attack Detection (fPAD), where a spoof detector is learned using only non-attacked images of users. These detectors are of practical importance as they are shown to generalize well to new attack types. In this paper, we present a deep-learning solution for anomaly detection-based spoof attack detection where both classifier and feature representations are learned together end-to-end. First, we introduce a pseudo-negative class during training in the absence of attacked images. The pseudo-negative class is modeled using a Gaussian distribution whose mean is calculated by a weighted running mean. Secondly, we use pairwise confusion loss to further regularize the training process. The proposed approach benefits from the representation learning power of the CNNs and learns better features for fPAD task as shown in our ablation study. We perform extensive experiments on four publicly available datasets: Replay-Attack, Rose-Youtu, OULU-NPU and Spoof in Wild to show the effectiveness of the proposed approach over the previous methods. Code is available at: \url{https://github.com/yashasvi97/IJCB2020_anomaly}
Recent crowd counting approaches have achieved excellent performance. However, they are essentially based on fully supervised paradigm and require large number of annotated samples. Obtaining annotations is an expensive and labour-intensive process. In this work, we focus on reducing the annotation efforts by learning to count in the crowd from limited number of labeled samples while leveraging a large pool of unlabeled data. Specifically, we propose a Gaussian Process-based iterative learning mechanism that involves estimation of pseudo-ground truth for the unlabeled data, which is then used as supervision for training the network. The proposed method is shown to be effective under the reduced data (semi-supervised) settings for several datasets like ShanghaiTech, UCF-QNRF, WorldExpo, UCSD, etc. Furthermore, we demonstrate that the proposed method can be leveraged to enable the network in learning to count from synthetic dataset while being able to generalize better to real-world datasets (synthetic-to-real transfer).
In this paper, we investigate how to detect intruders with low latency for Active Authentication (AA) systems with multiple-users. We extend the Quickest Change Detection (QCD) framework to the multiple-user case and formulate the Multiple-user Quickest Intruder Detection (MQID) algorithm. Furthermore, we extend the algorithm to the data-efficient scenario where intruder detection is carried out with fewer observation samples. We evaluate the effectiveness of the proposed method on two publicly available AA datasets on the face modality.
Due to its excellent performance, U-Net is the most widely used backbone architecture for biomedical image segmentation in the recent years. However, in our studies, we observe that there is a considerable performance drop in the case of detecting smaller anatomical landmarks with blurred noisy boundaries. We analyze this issue in detail, and address it by proposing an over-complete architecture (Ki-Net) which involves projecting the data onto higher dimensions (in the spatial sense). This network, when augmented with U-Net, results in significant improvements in the case of segmenting small anatomical landmarks and blurred noisy boundaries while obtaining better overall performance. Furthermore, the proposed network has additional benefits like faster convergence and fewer number of parameters. We evaluate the proposed method on the task of brain anatomy segmentation from 2D Ultrasound (US) of preterm neonates, and achieve an improvement of around 4% in terms of the DICE accuracy and Jaccard index as compared to the standard-U-Net, while outperforming the recent best methods by 2%. Code: https://github.com/jeya-maria-jose/KiU-Net-pytorch .
Face presentation attack detection plays a critical role in the modern face recognition pipeline. A face anti-spoofing (FAS) model with good generalization can be obtained when it is trained with face images from different input distributions and different types of spoof attacks. In reality, training data (both real face images and spoof images) are not directly shared between data owners due to legal and privacy issues. In this paper, with the motivation of circumventing this challenge, we propose Federated Face Anti-spoofing (FedFAS) framework. FedFAS simultaneously takes advantage of rich FAS information available at different data owners while preserving data privacy. In the proposed framework, each data owner (referred to as \textit{data centers}) locally trains its own FAS model. A server learns a global FAS model by iteratively aggregating model updates from all data centers without accessing private data in each of them. Once the learned global model converges, it is used for FAS inference. We introduce the experimental setting to evaluate the proposed FedFAS framework and carry out extensive experiments to provide various insights about federated learning for FAS.
Thermal-to-visible face verification is a challenging problem due to the large domain discrepancy between the modalities. Existing approaches either attempt to synthesize visible faces from thermal faces or extract robust features from these modalities for cross-modal matching. In this paper, we use attributes extracted from visible images to synthesize the attributepreserved visible images from thermal imagery for cross-modal matching. A pre-trained VGG-Face network is used to extract the attributes from the visible image. Then, a novel multi-scale generator is proposed to synthesize the visible image from the thermal image guided by the extracted attributes. Finally, a pretrained VGG-Face network is leveraged to extract features from the synthesized image and the input visible image for verification. An extended dataset consisting of polarimetric thermal faces of 121 subjects is also introduced. Extensive experiments evaluated on various datasets and protocols demonstrate that the proposed method achieves state-of-the-art per-formance.
Due to its variety of applications in the real-world, the task of single image-based crowd counting has received a lot of interest in the recent years. Recently, several approaches have been proposed to address various problems encountered in crowd counting. These approaches are essentially based on convolutional neural networks that require large amounts of data to train the network parameters. Considering this, we introduce a new large scale unconstrained crowd counting dataset (JHU-CROWD++) that contains "4,372" images with "1.51 million" annotations. In comparison to existing datasets, the proposed dataset is collected under a variety of diverse scenarios and environmental conditions. Specifically, the dataset includes several images with weather-based degradations and illumination variations, making it a very challenging dataset. Additionally, the dataset consists of a rich set of annotations at both image-level and head-level. Several recent methods are evaluated and compared on this dataset. The dataset can be downloaded from http://www.crowd-counting.com . Furthermore, we propose a novel crowd counting network that progressively generates crowd density maps via residual error estimation. The proposed method uses VGG16 as the backbone network and employs density map generated by the final layer as a coarse prediction to refine and generate finer density maps in a progressive fashion using residual learning. Additionally, the residual learning is guided by an uncertainty-based confidence weighting mechanism that permits the flow of only high-confidence residuals in the refinement path. The proposed Confidence Guided Deep Residual Counting Network (CG-DRCN) is evaluated on recent complex datasets, and it achieves significant improvements in errors.
Automatic segmentation of anatomical landmarks from ultrasound (US) plays an important role in the management of preterm neonates with a very low birth weight due to the increased risk of developing intraventricular hemorrhage (IVH) or other complications. One major problem in developing an automatic segmentation method for this task is the limited availability of annotated data. To tackle this issue, we propose a novel image synthesis method using multi-scale self attention generator to synthesize US images from various segmentation masks. We show that our method can synthesize high-quality US images for every manipulated segmentation label with qualitative and quantitative improvements over the recent state-of-the-art synthesis methods. Furthermore, for the segmentation task, we propose a novel method, called Confidence-guided Brain Anatomy Segmentation (CBAS) network, where segmentation and corresponding confidence maps are estimated at different scales. In addition, we introduce a technique which guides CBAS to learn the weights based on the confidence measure about the estimate. Extensive experiments demonstrate that the proposed method for both synthesis and segmentation tasks achieve significant improvements over the recent state-of-the-art methods. In particular, we show that the new synthesis framework can be used to generate realistic US images which can be used to improve the performance of a segmentation algorithm.
Automatic synthesis of faces from visual attributes is an important problem in computer vision and has wide applications in law enforcement and entertainment. With the advent of deep generative convolutional neural networks (CNNs), attempts have been made to synthesize face images from attributes and text descriptions. In this paper, we take a different approach, where we formulate the original problem as a stage-wise learning problem. We first synthesize the facial sketch corresponding to the visual attributes and then we generate the face image based on the synthesized sketch. The proposed framework, is based on a combination of two different Generative Adversarial Networks (GANs) - (1) a sketch generator network which synthesizes realistic sketch from the input attributes, and (2) a face generator network which synthesizes facial images from the synthesized sketch images with the help of facial attributes. Extensive experiments and comparison with recent methods are performed to verify the effectiveness of the proposed attribute-based two-stage face synthesis method.
Fast and accurate MRI image reconstruction from undersampled data is critically important in clinical practice. Compressed sensing based methods are widely used in image reconstruction but the speed is slow due to the iterative algorithms. Deep learning based methods have shown promising advances in recent years. However, recovering the fine details from highly undersampled data is still challenging. In this paper, we introduce a novel deep learning-based method, Pyramid Convolutional RNN (PC-RNN), to reconstruct the image from multiple scales. We evaluated our model on the fastMRI dataset and the results show that the proposed model achieves significant improvements than other methods and can recover more fine details.