Get our free extension to see links to code for papers anywhere online!

Chrome logo  Add to Chrome

Firefox logo Add to Firefox

"photo": models, code, and papers

Social Style Characterization from Egocentric Photo-streams

Sep 18, 2017
Maedeh Aghaei, Mariella Dimiccoli, Cristian Canton Ferrer, Petia Radeva

This paper proposes a system for automatic social pattern characterization using a wearable photo-camera. The proposed pipeline consists of three major steps. First, detection of people with whom the camera wearer interacts and, second, categorization of the detected social interactions into formal and informal. These two steps act at event-level where each potential social event is modeled as a multi-dimensional time-series, whose dimensions correspond to a set of relevant features for each task, and a LSTM network is employed for time-series classification. In the last step, recurrences of the same person across the whole set of social interactions are clustered to achieve a comprehensive understanding of the diversity and frequency of the social relations of the user. Experiments over a dataset acquired by a user wearing a photo-camera during a month show promising results on the task of social pattern characterization from egocentric photo-streams.

* International Conference on Computer Vision (ICCV). Workshop on Egocentric Percetion, Interaction and Computing 

Pseudo Rehearsal using non photo-realistic images

Apr 28, 2020
Bhasker Sri Harsha Suri, Kalidas Yeturu

Deep Neural networks forget previously learnt tasks when they are faced with learning new tasks. This is called catastrophic forgetting. Rehearsing the neural network with the training data of the previous task can protect the network from catastrophic forgetting. Since rehearsing requires the storage of entire previous data, Pseudo rehearsal was proposed, where samples belonging to the previous data are generated synthetically for rehearsal. In an image classification setting, while current techniques try to generate synthetic data that is photo-realistic, we demonstrated that Neural networks can be rehearsed on data that is not photo-realistic and still achieve good retention of the previous task. We also demonstrated that forgoing the constraint of having photo realism in the generated data can result in a significant reduction in the consumption of computational and memory resources for pseudo rehearsal.


CNN-based Repetitive self-revised learning for photos' aesthetics imbalanced classification

Mar 27, 2020
Ying Dai

Aesthetic assessment is subjective, and the distribution of the aesthetic levels is imbalanced. In order to realize the auto-assessment of photo aesthetics, we focus on using repetitive self-revised learning (RSRL) to train the CNN-based aesthetics classification network by imbalanced data set. As RSRL, the network is trained repetitively by dropping out the low likelihood photo samples at the middle levels of aesthetics from the training data set based on the previously trained network. Further, the retained two networks are used in extracting highlight regions of the photos related with the aesthetic assessment. Experimental results show that the CNN-based repetitive self-revised learning is effective for improving the performances of the imbalanced classification.

* arXiv admin note: substantial text overlap with arXiv:1909.08213 

DocFace: Matching ID Document Photos to Selfies

May 06, 2018
Yichun Shi, Anil K. Jain

Numerous activities in our daily life, including transactions, access to services and transportation, require us to verify who we are by showing our ID documents containing face images, e.g. passports and driver licenses. An automatic system for matching ID document photos to live face images in real time with high accuracy would speedup the verification process and remove the burden on human operators. In this paper, by employing the transfer learning technique, we propose a new method, DocFace, to train a domain-specific network for ID document photo matching without a large dataset. Compared with the baseline of applying existing methods for general face recognition to this problem, our method achieves considerable improvement. A cross validation on an ID-Selfie dataset shows that DocFace improves the TAR from 61.14% to 92.77% at FAR=0.1%. Experimental results also indicate that given more training data, a viable system for automatic ID document photo matching can be developed and deployed.


Sample-specific repetitive learning for photo aesthetic assessment and highlight region extraction

Sep 18, 2019
Ying Dai

Aesthetic assessment is subjective, and the distribution of the aesthetic levels is imbalanced. In order to realize the auto-assessment of photo aesthetics, we focus on retraining the CNN-based aesthetic assessment model by dropping out the unavailable samples in the middle levels from the training data set repetitively to overcome the effect of imbalanced aesthetic data on classification. Further, the method of extracting aesthetics highlight region of the photo image by using the two repetitively trained models is presented. Therefore, the correlation of the extracted region with the aesthetic levels is analyzed to illustrate what aesthetics features influence the aesthetic quality of the photo. Moreover, the testing data set is from the different data source called 500px. Experimental results show that the proposed method is effective.

* 15 pages, 9 figures, 3 tables 

Legacy Photo Editing with Learned Noise Prior

Nov 24, 2020
Zhao Yuzhi, Po Lai-Man, Wang Xuehui, Liu Kangcheng, Zhang Yujia, Yu Wing-Yin, Xian Pengfei, Xiong Jingjing

There are quite a number of photographs captured under undesirable conditions in the last century. Thus, they are often noisy, regionally incomplete, and grayscale formatted. Conventional approaches mainly focus on one point so that those restoration results are not perceptually sharp or clean enough. To solve these problems, we propose a noise prior learner NEGAN to simulate the noise distribution of real legacy photos using unpaired images. It mainly focuses on matching high-frequency parts of noisy images through discrete wavelet transform (DWT) since they include most of noise statistics. We also create a large legacy photo dataset for learning noise prior. Using learned noise prior, we can easily build valid training pairs by degrading clean images. Then, we propose an IEGAN framework performing image editing including joint denoising, inpainting and colorization based on the estimated noise prior. We evaluate the proposed system and compare it with state-of-the-art image enhancement methods. The experimental results demonstrate that it achieves the best perceptual quality. for the codes and the proposed LP dataset.

* accepted by IEEE WACV 2021, 2nd round submission 

StreetStyle: Exploring world-wide clothing styles from millions of photos

Jun 06, 2017
Kevin Matzen, Kavita Bala, Noah Snavely

Each day billions of photographs are uploaded to photo-sharing services and social media platforms. These images are packed with information about how people live around the world. In this paper we exploit this rich trove of data to understand fashion and style trends worldwide. We present a framework for visual discovery at scale, analyzing clothing and fashion across millions of images of people around the world and spanning several years. We introduce a large-scale dataset of photos of people annotated with clothing attributes, and use this dataset to train attribute classifiers via deep learning. We also present a method for discovering visually consistent style clusters that capture useful visual correlations in this massive dataset. Using these tools, we analyze millions of photos to derive visual insight, producing a first-of-its-kind analysis of global and per-city fashion choices and spatio-temporal trends.


Person Recognition in Personal Photo Collections

Oct 20, 2018
Seong Joon Oh, Rodrigo Benenson, Mario Fritz, Bernt Schiele

People nowadays share large parts of their personal lives through social media. Being able to automatically recognise people in personal photos may greatly enhance user convenience by easing photo album organisation. For human identification task, however, traditional focus of computer vision has been face recognition and pedestrian re-identification. Person recognition in social media photos sets new challenges for computer vision, including non-cooperative subjects (e.g. backward viewpoints, unusual poses) and great changes in appearance. To tackle this problem, we build a simple person recognition framework that leverages convnet features from multiple image regions (head, body, etc.). We propose new recognition scenarios that focus on the time and appearance gap between training and testing samples. We present an in-depth analysis of the importance of different features according to time and viewpoint generalisability. In the process, we verify that our simple approach achieves the state of the art result on the PIPA benchmark, arguably the largest social media based benchmark for person recognition to date with diverse poses, viewpoints, social groups, and events. Compared the conference version of the paper, this paper additionally presents (1) analysis of a face recogniser (DeepID2+), (2) new method naeil2 that combines the conference version method naeil and DeepID2+ to achieve state of the art results even compared to post-conference works, (3) discussion of related work since the conference version, (4) additional analysis including the head viewpoint-wise breakdown of performance, and (5) results on the open-world setup.

* 18 pages, 20 figures; to appear in IEEE Transactions on Pattern Analysis and Machine Intelligence