The evolution of clothing styles and their migration across the world is intriguing, yet difficult to describe quantitatively. We propose to discover and quantify fashion influences from catalog and social media photos. We explore fashion influence along two channels: geolocation and fashion brands. We introduce an approach that detects which of these entities influence which other entities in terms of propagating their styles. We then leverage the discovered influence patterns to inform a novel forecasting model that predicts the future popularity of any given style within any given city or brand. To demonstrate our idea, we leverage public large-scale datasets of 7.7M Instagram photos from 44 major world cities (where styles are worn with variable frequency) as well as 41K Amazon product photos (where styles are purchased with variable frequency). Our model learns directly from the image data how styles move between locations and how certain brands affect each other's designs in a predictable way. The discovered influence relationships reveal how both cities and brands exert and receive fashion influence for an array of visual styles inferred from the images. Furthermore, the proposed forecasting model achieves state-of-the-art results for challenging style forecasting tasks. Our results indicate the advantage of grounding visual style evolution both spatially and temporally, and for the first time, they quantify the propagation of inter-brand and inter-city influences.
Given an unconstrained stream of images captured by a wearable photo-camera (2fpm), we propose an unsupervised bottom-up approach for automatic clustering appearing faces into the individual identities present in these data. The problem is challenging since images are acquired under real world conditions; hence the visible appearance of the people in the images undergoes intensive variations. Our proposed pipeline consists of first arranging the photo-stream into events, later, localizing the appearance of multiple people in them, and finally, grouping various appearances of the same person across different events. Experimental results performed on a dataset acquired by wearing a photo-camera during one month, demonstrate the effectiveness of the proposed approach for the considered purpose.
This paper proposes a system for automatic social pattern characterization using a wearable photo-camera. The proposed pipeline consists of three major steps. First, detection of people with whom the camera wearer interacts and, second, categorization of the detected social interactions into formal and informal. These two steps act at event-level where each potential social event is modeled as a multi-dimensional time-series, whose dimensions correspond to a set of relevant features for each task, and a LSTM network is employed for time-series classification. In the last step, recurrences of the same person across the whole set of social interactions are clustered to achieve a comprehensive understanding of the diversity and frequency of the social relations of the user. Experiments over a dataset acquired by a user wearing a photo-camera during a month show promising results on the task of social pattern characterization from egocentric photo-streams.
Deep Neural networks forget previously learnt tasks when they are faced with learning new tasks. This is called catastrophic forgetting. Rehearsing the neural network with the training data of the previous task can protect the network from catastrophic forgetting. Since rehearsing requires the storage of entire previous data, Pseudo rehearsal was proposed, where samples belonging to the previous data are generated synthetically for rehearsal. In an image classification setting, while current techniques try to generate synthetic data that is photo-realistic, we demonstrated that Neural networks can be rehearsed on data that is not photo-realistic and still achieve good retention of the previous task. We also demonstrated that forgoing the constraint of having photo realism in the generated data can result in a significant reduction in the consumption of computational and memory resources for pseudo rehearsal.
Aesthetic assessment is subjective, and the distribution of the aesthetic levels is imbalanced. In order to realize the auto-assessment of photo aesthetics, we focus on using repetitive self-revised learning (RSRL) to train the CNN-based aesthetics classification network by imbalanced data set. As RSRL, the network is trained repetitively by dropping out the low likelihood photo samples at the middle levels of aesthetics from the training data set based on the previously trained network. Further, the retained two networks are used in extracting highlight regions of the photos related with the aesthetic assessment. Experimental results show that the CNN-based repetitive self-revised learning is effective for improving the performances of the imbalanced classification.
Numerous activities in our daily life, including transactions, access to services and transportation, require us to verify who we are by showing our ID documents containing face images, e.g. passports and driver licenses. An automatic system for matching ID document photos to live face images in real time with high accuracy would speedup the verification process and remove the burden on human operators. In this paper, by employing the transfer learning technique, we propose a new method, DocFace, to train a domain-specific network for ID document photo matching without a large dataset. Compared with the baseline of applying existing methods for general face recognition to this problem, our method achieves considerable improvement. A cross validation on an ID-Selfie dataset shows that DocFace improves the TAR from 61.14% to 92.77% at FAR=0.1%. Experimental results also indicate that given more training data, a viable system for automatic ID document photo matching can be developed and deployed.
Aesthetic assessment is subjective, and the distribution of the aesthetic levels is imbalanced. In order to realize the auto-assessment of photo aesthetics, we focus on retraining the CNN-based aesthetic assessment model by dropping out the unavailable samples in the middle levels from the training data set repetitively to overcome the effect of imbalanced aesthetic data on classification. Further, the method of extracting aesthetics highlight region of the photo image by using the two repetitively trained models is presented. Therefore, the correlation of the extracted region with the aesthetic levels is analyzed to illustrate what aesthetics features influence the aesthetic quality of the photo. Moreover, the testing data set is from the different data source called 500px. Experimental results show that the proposed method is effective.
There are quite a number of photographs captured under undesirable conditions in the last century. Thus, they are often noisy, regionally incomplete, and grayscale formatted. Conventional approaches mainly focus on one point so that those restoration results are not perceptually sharp or clean enough. To solve these problems, we propose a noise prior learner NEGAN to simulate the noise distribution of real legacy photos using unpaired images. It mainly focuses on matching high-frequency parts of noisy images through discrete wavelet transform (DWT) since they include most of noise statistics. We also create a large legacy photo dataset for learning noise prior. Using learned noise prior, we can easily build valid training pairs by degrading clean images. Then, we propose an IEGAN framework performing image editing including joint denoising, inpainting and colorization based on the estimated noise prior. We evaluate the proposed system and compare it with state-of-the-art image enhancement methods. The experimental results demonstrate that it achieves the best perceptual quality. https://github.com/zhaoyuzhi/Legacy-Photo-Editing-with-Learned-Noise-Prior for the codes and the proposed LP dataset.