Get our free extension to see links to code for papers anywhere online!

Chrome logo  Add to Chrome

Firefox logo Add to Firefox

"photo": models, code, and papers

HARRISON: A Benchmark on HAshtag Recommendation for Real-world Images in Social Networks

May 17, 2016
Minseok Park, Hanxiang Li, Junmo Kim

Simple, short, and compact hashtags cover a wide range of information on social networks. Although many works in the field of natural language processing (NLP) have demonstrated the importance of hashtag recommendation, hashtag recommendation for images has barely been studied. In this paper, we introduce the HARRISON dataset, a benchmark on hashtag recommendation for real world images in social networks. The HARRISON dataset is a realistic dataset, composed of 57,383 photos from Instagram and an average of 4.5 associated hashtags for each photo. To evaluate our dataset, we design a baseline framework consisting of visual feature extractor based on convolutional neural network (CNN) and multi-label classifier based on neural network. Based on this framework, two single feature-based models, object-based and scene-based model, and an integrated model of them are evaluated on the HARRISON dataset. Our dataset shows that hashtag recommendation task requires a wide and contextual understanding of the situation conveyed in the image. As far as we know, this work is the first vision-only attempt at hashtag recommendation for real world images in social networks. We expect this benchmark to accelerate the advancement of hashtag recommendation.

  
Access Paper or Ask Questions

LeRoP: A Learning-Based Modular Robot Photography Framework

Nov 28, 2019
Hao Kang, Jianming Zhang, Haoxiang Li, Zhe Lin, TJ Rhodes, Bedrich Benes

We introduce a novel framework for automatic capturing of human portraits. The framework allows the robot to follow a person to the desired location using a Person Re-identification model. When composing is activated, the robot attempts to adjust its position to form the view that can best match the given template image, and finally takes a photograph. A template image can be predicted dynamically using an off-the-shelf photo evaluation model by the framework, or selected manually from a pre-defined set by the user. The template matching-based view adjustment is driven by a deep reinforcement learning network. Our framework lies on top of the Robot Operating System (ROS). The framework is designed to be modular so that all the models can be flexibly replaced based on needs. We show our framework on a variety of examples. In particular, we tested it in three indoor scenes and used it to take 20 photos of each scene: ten for the pre-defined template, ten for the dynamically generated ones. The average number of adjustment was $11.20$ for pre-defined templates and $12.76$ for dynamically generated ones; the average time spent was $22.11$ and $24.10$ seconds respectively.

  
Access Paper or Ask Questions

Large-scale Bisample Learning on ID vs. Spot Face Recognition

Jun 11, 2018
Xiangyu Zhu, Hao Liu, Zhen Lei, Hailin Shi, Fan Yang, Dong Yi, Stan Z. Li

In many face recognition applications, there is large amount of face data with two images for each person. One is an ID photo for face enrollment, and the other is a probe photo captured on spot. Most existing methods are designed for training data with limited breadth (relatively small class number) and sufficient depth (many samples for each class). They would meet great challenges when applied on this ID vs. Spot (IvS) data, including the under-represented intra-class variations and the excessive demand on computing devices. In this paper, we propose a deep learning based large-scale bisample learning (LBL) method for IvS face recognition. To tackle the bisample problem that there are only two samples for each class, a classification-verification-classification (CVC) training strategy is proposed to progressively enhance the IvS performance. Besides, a dominant prototype softmax (DP-softmax) is incorporated to make the deep learning applicable on large-scale classes. We conduct LBL on a IvS face dataset with more than two million identities. Experimental results show the proposed method achieves superior performance than previous ones, validating the effectiveness of LBL on IvS face recognition.

  
Access Paper or Ask Questions

Self-supervised Deformation Modeling for Facial Expression Editing

Nov 05, 2019
ShahRukh Athar, Zhixin Shu, Dimitris Samaras

Recent advances in deep generative models have demonstrated impressive results in photo-realistic facial image synthesis and editing. Facial expressions are inherently the result of muscle movement. However, existing neural network-based approaches usually only rely on texture generation to edit expressions and largely neglect the motion information. In this work, we propose a novel end-to-end network that disentangles the task of facial editing into two steps: a " "motion-editing" step and a "texture-editing" step. In the "motion-editing" step, we explicitly model facial movement through image deformation, warping the image into the desired expression. In the "texture-editing" step, we generate necessary textures, such as teeth and shading effects, for a photo-realistic result. Our physically-based task-disentanglement system design allows each step to learn a focused task, removing the need of generating texture to hallucinate motion. Our system is trained in a self-supervised manner, requiring no ground truth deformation annotation. Using Action Units [8] as the representation for facial expression, our method improves the state-of-the-art facial expression editing performance in both qualitative and quantitative evaluations.

  
Access Paper or Ask Questions

Synthesizing Coupled 3D Face Modalities by Trunk-Branch Generative Adversarial Networks

Sep 05, 2019
Baris Gecer, Alexander Lattas, Stylianos Ploumpis, Jiankang Deng, Athanasios Papaioannou, Stylianos Moschoglou, Stefanos Zafeiriou

Generating realistic 3D faces is of high importance for computer graphics and computer vision applications. Generally, research on 3D face generation revolves around linear statistical models of the facial surface. Nevertheless, these models cannot represent faithfully either the facial texture or the normals of the face, which are very crucial for photo-realistic face synthesis. Recently, it was demonstrated that Generative Adversarial Networks (GANs) can be used for generating high-quality textures of faces. Nevertheless, the generation process either omits the geometry and normals, or independent processes are used to produce 3D shape information. In this paper, we present the first methodology that generates high-quality texture, shape, and normals jointly, which can be used for photo-realistic synthesis. To do so, we propose a novel GAN that can generate data from different modalities while exploiting their correlations. Furthermore, we demonstrate how we can condition the generation on the expression and create faces with various facial expressions. The qualitative results shown in this pre-print is compressed due to size limitations, full resolution results and the accompanying video can be found at the project page: https://github.com/barisgecer/TBGAN.

* Check project page: https://github.com/barisgecer/TBGAN for the full resolution results and the accompanying video 
  
Access Paper or Ask Questions

Beautiful and damned. Combined effect of content quality and social ties on user engagement

Nov 01, 2017
Luca M. Aiello, Rossano Schifanella, Miriam Redi, Stacey Svetlichnaya, Frank Liu, Simon Osindero

User participation in online communities is driven by the intertwinement of the social network structure with the crowd-generated content that flows along its links. These aspects are rarely explored jointly and at scale. By looking at how users generate and access pictures of varying beauty on Flickr, we investigate how the production of quality impacts the dynamics of online social systems. We develop a deep learning computer vision model to score images according to their aesthetic value and we validate its output through crowdsourcing. By applying it to over 15B Flickr photos, we study for the first time how image beauty is distributed over a large-scale social system. Beautiful images are evenly distributed in the network, although only a small core of people get social recognition for them. To study the impact of exposure to quality on user engagement, we set up matching experiments aimed at detecting causality from observational data. Exposure to beauty is double-edged: following people who produce high-quality content increases one's probability of uploading better photos; however, an excessive imbalance between the quality generated by a user and the user's neighbors leads to a decline in engagement. Our analysis has practical implications for improving link recommender systems.

* 13 pages, 12 figures, final version published in IEEE Transactions on Knowledge and Data Engineering (Volume: PP, Issue: 99) 
  
Access Paper or Ask Questions

Auto-Embedding Generative Adversarial Networks for High Resolution Image Synthesis

Mar 29, 2019
Yong Guo, Qi Chen, Jian Chen, Qingyao Wu, Qinfeng Shi, Mingkui Tan

Generating images via the generative adversarial network (GAN) has attracted much attention recently. However, most of the existing GAN-based methods can only produce low-resolution images of limited quality. Directly generating high-resolution images using GANs is nontrivial, and often produces problematic images with incomplete objects. To address this issue, we develop a novel GAN called Auto-Embedding Generative Adversarial Network (AEGAN), which simultaneously encodes the global structure features and captures the fine-grained details. In our network, we use an autoencoder to learn the intrinsic high-level structure of real images and design a novel denoiser network to provide photo-realistic details for the generated images. In the experiments, we are able to produce 512x512 images of promising quality directly from the input noise. The resultant images exhibit better perceptual photo-realism, i.e., with sharper structure and richer details, than other baselines on several datasets, including Oxford-102 Flowers, Caltech-UCSD Birds (CUB), High-Quality Large-scale CelebFaces Attributes (CelebA-HQ), Large-scale Scene Understanding (LSUN) and ImageNet.

* Accepted by IEEE Transactions on Multimedia 
  
Access Paper or Ask Questions

Attribute-Guided Sketch Generation

Jan 28, 2019
Hao Tang, Xinya Chen, Wei Wang, Dan Xu, Jason J. Corso, Nicu Sebe, Yan Yan

Facial attributes are important since they provide a detailed description and determine the visual appearance of human faces. In this paper, we aim at converting a face image to a sketch while simultaneously generating facial attributes. To this end, we propose a novel Attribute-Guided Sketch Generative Adversarial Network (ASGAN) which is an end-to-end framework and contains two pairs of generators and discriminators, one of which is used to generate faces with attributes while the other one is employed for image-to-sketch translation. The two generators form a W-shaped network (W-net) and they are trained jointly with a weight-sharing constraint. Additionally, we also propose two novel discriminators, the residual one focusing on attribute generation and the triplex one helping to generate realistic looking sketches. To validate our model, we have created a new large dataset with 8,804 images, named the Attribute Face Photo & Sketch (AFPS) dataset which is the first dataset containing attributes associated to face sketch images. The experimental results demonstrate that the proposed network (i) generates more photo-realistic faces with sharper facial attributes than baselines and (ii) has good generalization capability on different generative tasks.

* 9 pages, 9 figures, accepted to FG 2019 
  
Access Paper or Ask Questions

Face Sketch Matching via Coupled Deep Transform Learning

Oct 09, 2017
Shruti Nagpal, Maneet Singh, Richa Singh, Mayank Vatsa, Afzel Noore, Angshul Majumdar

Face sketch to digital image matching is an important challenge of face recognition that involves matching across different domains. Current research efforts have primarily focused on extracting domain invariant representations or learning a mapping from one domain to the other. In this research, we propose a novel transform learning based approach termed as DeepTransformer, which learns a transformation and mapping function between the features of two domains. The proposed formulation is independent of the input information and can be applied with any existing learned or hand-crafted feature. Since the mapping function is directional in nature, we propose two variants of DeepTransformer: (i) semi-coupled and (ii) symmetrically-coupled deep transform learning. This research also uses a novel IIIT-D Composite Sketch with Age (CSA) variations database which contains sketch images of 150 subjects along with age-separated digital photos. The performance of the proposed models is evaluated on a novel application of sketch-to-sketch matching, along with sketch-to-digital photo matching. Experimental results demonstrate the robustness of the proposed models in comparison to existing state-of-the-art sketch matching algorithms and a commercial face recognition system.

* International Conference on Computer Vision, 2017 
  
Access Paper or Ask Questions

A Multimodal Approach to Predict Social Media Popularity

Jul 16, 2018
Mayank Meghawat, Satyendra Yadav, Debanjan Mahata, Yifang Yin, Rajiv Ratn Shah, Roger Zimmermann

Multiple modalities represent different aspects by which information is conveyed by a data source. Modern day social media platforms are one of the primary sources of multimodal data, where users use different modes of expression by posting textual as well as multimedia content such as images and videos for sharing information. Multimodal information embedded in such posts could be useful in predicting their popularity. To the best of our knowledge, no such multimodal dataset exists for the prediction of social media photos. In this work, we propose a multimodal dataset consisiting of content, context, and social information for popularity prediction. Specifically, we augment the SMPT1 dataset for social media prediction in ACM Multimedia grand challenge 2017 with image content, titles, descriptions, and tags. Next, in this paper, we propose a multimodal approach which exploits visual features (i.e., content information), textual features (i.e., contextual information), and social features (e.g., average views and group counts) to predict popularity of social media photos in terms of view counts. Experimental results confirm that despite our multimodal approach uses the half of the training dataset from SMP-T1, it achieves comparable performance with that of state-of-the-art.

* Preprint version for paper accepted in Proceedings of 1st IEEE International Conference on Multimedia Information Processing and Retrieval 
  
Access Paper or Ask Questions
<<
36
37
38
39
40
41
42
43
44
45
46
47
48
>>