Profile images on social networks are users' opportunity to present themselves and to affect how others judge them. We examine what Facebook images say about users' perceived and measured intelligence. 1,122 Facebook users completed a matrices intelligence test and shared their current Facebook profile image. Strangers also rated the images for perceived intelligence. We use automatically extracted image features to predict both measured and perceived intelligence. Intelligence estimation from images is a difficult task even for humans, but experimental results show that human accuracy can be equalled using computing methods. We report the image features that predict both measured and perceived intelligence, and highlight misleading features such as "smiling" and "wearing glasses" that are correlated with perceived but not measured intelligence. Our results give insights into inaccurate stereotyping from profile images and also have implications for privacy, especially since in most social networks profile images are public by default.
Research in scene graph generation has quickly gained traction in the past few years because of its potential to help in downstream tasks like visual question answering, image captioning, etc. Many interesting approaches have been proposed to tackle this problem. Most of these works have a pre-trained object detection model as a preliminary feature extractor. Therefore, getting object bounding box proposals from the object detection model is relatively cheaper. We take advantage of this ready availability of bounding box annotations produced by the pre-trained detector. We propose a set of three novel yet simple self-supervision tasks and train them as auxiliary multi-tasks to the main model. While comparing, we train the base-model from scratch with these self-supervision tasks, we achieve state-of-the-art results in all the metrics and recall settings. We also resolve some of the confusion between two types of relationships: geometric and possessive, by training the model with the proposed self-supervision losses. We use the benchmark dataset, Visual Genome to conduct our experiments and show our results.
Facial expressions are important cues to observe human emotions. Facial expression recognition has attracted many researchers for years, but it is still a challenging topic since expression features vary greatly with the head poses, environments, and variations in the different persons involved. In this work, three major steps are involved to improve the performance of micro-facial expression recognition. First, an Adaptive Homomorphic Filtering is used for face detection and rotation rectification processes. Secondly, Micro-facial features were used to extract the appearance variations of a testing image-spatial analysis. The features of motion information are used for expression recognition in a sequence of facial images. An effective Micro-Facial Expression Based Deep-Rooted Learning (MFEDRL) classifier is proposed in this paper to better recognize spontaneous micro-expressions by learning parameters on the optimal features. This proposed method includes two loss functions such as cross entropy loss function and centre loss function. Then the performance of the algorithm will be evaluated using recognition rate and false measures. Simulation results show that the predictive performance of the proposed method outperforms that of the existing classifiers such as Convolutional Neural Network (CNN), Deep Neural Network (DNN), Artificial Neural Network (ANN), Support Vector Machine (SVM), and k-Nearest Neighbours (KNN) in terms of accuracy and Mean Absolute Error (MAE).
In this paper we propose a new augmentation technique, called patch augmentation, that, in our experiments, improves model accuracy and makes networks more robust to adversarial attacks. In brief, this data-independent approach creates new image data based on image/label pairs, where a patch from one of the two images in the pair is superimposed on to the other image, creating a new augmented sample. The new image's label is a linear combination of the image pair's corresponding labels. Initial experiments show a several percentage point increase in accuracy on CIFAR-10, from a baseline of 80.6% to 86.8%. CIFAR-100 sees better improvements still. Networks trained using patch augmentation are also more robust to adversarial attacks, which we demonstrate using the Fast Gradient Sign Method. An adversarial misclassification, or adversarial attack, occurs when an image that should seemingly be easily classified correctly by the network is suddenly classified as belonging to a completely different class-and with high confidence. Such occurrences are difficult to diagnose and are a cause of much concern in artificial intelligence research, as any model trained with empirical risk minimisation seems to be vulnerable to such attacks. The ease at which neural networks are susceptible to adversarial perturbations are partially the result of images lying close to the decision boundaries that are typically learned by neural networks during their training. Patch augmentation is an attempt to train more efficient decision boundaries.
In projector-camera systems, light transport models the propagation from projector emitted radiance to camera-captured irradiance. In this paper, we propose the first end-to-end trainable solution named Deep Light Transport (DeLTra) that estimates radiometrically uncalibrated projector-camera light transport. DeLTra is designed to have two modules: DepthToAtrribute and ShadingNet. DepthToAtrribute explicitly learns rays, depth and normal, and then estimates rough Phong illuminations. Afterwards, the CNN-based ShadingNet renders photorealistic camera-captured image using estimated shading attributes and rough Phong illuminations. A particular challenge addressed by DeLTra is occlusion, for which we exploit epipolar constraint and propose a novel differentiable direct light mask. Thus, it can be learned end-to-end along with the other DeLTra modules. Once trained, DeLTra can be applied simultaneously to three projector-camera tasks: image-based relighting, projector compensation and depth/normal reconstruction. In our experiments, DeLTra shows clear advantages over previous arts with promising quality and meanwhile being practically convenient.
Neural architecture search (NAS) has emerged as a promising avenue for automatically designing task-specific neural networks. Most existing NAS approaches require one complete search for each deployment specification of hardware or objective. This is a computationally impractical endeavor given the potentially large number of application scenarios. In this paper, we propose Neural Architecture Transfer (NAT) to overcome this limitation. NAT is designed to efficiently generate task-specific custom models that are competitive even under multiple conflicting objectives. To realize this goal we learn task-specific supernets from which specialized subnets can be sampled without any additional training. The key to our approach is an integrated online transfer learning and many-objective evolutionary search procedure. A pre-trained supernet is iteratively adapted while simultaneously searching for task-specific subnets. We demonstrate the efficacy of NAT on 11 benchmark image classification tasks ranging from large-scale multi-class to small-scale fine-grained datasets. In all cases, including ImageNet, NATNets improve upon the state-of-the-art under mobile settings ($\leq$ 600M Multiply-Adds). Surprisingly, small-scale fine-grained datasets benefit the most from NAT. At the same time, the architecture search and transfer is orders of magnitude more efficient than existing NAS methods. Overall, experimental evaluation indicates that across diverse image classification tasks and computational objectives, NAT is an appreciably more effective alternative to fine-tuning based transfer learning. Code is available at https://github.com/human-analysis/neural-architecture-transfer
Deep neural network architectures have attained remarkable improvements in scene understanding tasks. Utilizing an efficient model is one of the most important constraints for limited-resource devices. Recently, several compression methods have been proposed to diminish the heavy computational burden and memory consumption. Among them, the pruning and quantizing methods exhibit a critical drop in performances by compressing the model parameters. While the knowledge distillation methods improve the performance of compact models by focusing on training lightweight networks with the supervision of cumbersome networks. In the proposed method, the knowledge distillation has been performed within the network by constructing multiple branches over the primary stream of the model, known as the self-distillation method. Therefore, the ensemble of sub-neural network models has been proposed to transfer the knowledge among themselves with the knowledge distillation policies as well as an adversarial learning strategy. Hence, The proposed ensemble of sub-models is trained against a discriminator model adversarially. Besides, their knowledge is transferred within the ensemble by four different loss functions. The proposed method has been devoted to both lightweight image classification and encoder-decoder architectures to boost the performance of small and compact models without incurring extra computational overhead at the inference process. Extensive experimental results on the main challenging datasets show that the proposed network outperforms the primary model in terms of accuracy at the same number of parameters and computational cost. The obtained results show that the proposed model has achieved significant improvement over earlier ideas of self-distillation methods. The effectiveness of the proposed models has also been illustrated in the encoder-decoder model.
Unsupervised extraction of objects from low-level visual data is an important goal for further progress in machine learning. Existing approaches for representing objects without labels use structured generative models with static images. These methods focus a large amount of their capacity on reconstructing unimportant background pixels, missing low contrast or small objects. Conversely, we present a new method that avoids losses in pixel space and over-reliance on the limited signal a static image provides. Our approach takes advantage of objects' motion by learning a discriminative, time-contrastive loss in the space of slot representations, attempting to force each slot to not only capture entities that move, but capture distinct objects from the other slots. Moreover, we introduce a new quantitative evaluation metric to measure how "diverse" a set of slot vectors are, and use it to evaluate our model on 20 Atari games.
Fusing multi-modality medical images, such as MR and PET, can provide various anatomical or functional information about human body. But PET data is always unavailable due to different reasons such as cost, radiation, or other limitations. In this paper, we propose a 3D end-to-end synthesis network, called Bidirectional Mapping Generative Adversarial Networks (BMGAN), where image contexts and latent vector are effectively used and jointly optimized for brain MR-to-PET synthesis. Concretely, a bidirectional mapping mechanism is designed to embed the semantic information of PET images into the high dimensional latent space. And the 3D DenseU-Net generator architecture and the extensive objective functions are further utilized to improve the visual quality of synthetic results. The most appealing part is that the proposed method can synthesize the perceptually realistic PET images while preserving the diverse brain structures of different subjects. Experimental results demonstrate that the performance of the proposed method outperforms other competitive cross-modality synthesis methods in terms of quantitative measures, qualitative displays, and classification evaluation.
Deep neural networks have revolutionized medical image analysis and disease diagnosis. Despite their impressive performance, it is difficult to generate well-calibrated probabilistic outputs for such networks, which makes them uninterpretable black boxes. Bayesian neural networks provide a principled approach for modelling uncertainty and increasing patient safety, but they have a large computational overhead and provide limited improvement in calibration. In this work, by taking skin lesion classification as an example task, we show that by shifting Bayesian inference to the functional space we can craft meaningful priors that give better calibrated uncertainty estimates at a much lower computational cost.