Alert button
Picture for He Yan

He Yan

Alert button

ClothFormer:Taming Video Virtual Try-on in All Module

Apr 26, 2022
Jianbin Jiang, Tan Wang, He Yan, Junhui Liu

Figure 1 for ClothFormer:Taming Video Virtual Try-on in All Module
Figure 2 for ClothFormer:Taming Video Virtual Try-on in All Module
Figure 3 for ClothFormer:Taming Video Virtual Try-on in All Module
Figure 4 for ClothFormer:Taming Video Virtual Try-on in All Module

The task of video virtual try-on aims to fit the target clothes to a person in the video with spatio-temporal consistency. Despite tremendous progress of image virtual try-on, they lead to inconsistency between frames when applied to videos. Limited work also explored the task of video-based virtual try-on but failed to produce visually pleasing and temporally coherent results. Moreover, there are two other key challenges: 1) how to generate accurate warping when occlusions appear in the clothing region; 2) how to generate clothes and non-target body parts (e.g. arms, neck) in harmony with the complicated background; To address them, we propose a novel video virtual try-on framework, ClothFormer, which successfully synthesizes realistic, harmonious, and spatio-temporal consistent results in complicated environment. In particular, ClothFormer involves three major modules. First, a two-stage anti-occlusion warping module that predicts an accurate dense flow mapping between the body regions and the clothing regions. Second, an appearance-flow tracking module utilizes ridge regression and optical flow correction to smooth the dense flow sequence and generate a temporally smooth warped clothing sequence. Third, a dual-stream transformer extracts and fuses clothing textures, person features, and environment information to generate realistic try-on videos. Through rigorous experiments, we demonstrate that our method highly surpasses the baselines in terms of synthesized video quality both qualitatively and quantitatively.

* CVPR2022 Oral, project page https://cloth-former.github.io 
Viaarxiv icon

Unknown Identity Rejection Loss: Utilizing Unlabeled Data for Face Recognition

Oct 24, 2019
Haiming Yu, Yin Fan, Keyu Chen, He Yan, Xiangju Lu, Junhui Liu, Danming Xie

Figure 1 for Unknown Identity Rejection Loss: Utilizing Unlabeled Data for Face Recognition
Figure 2 for Unknown Identity Rejection Loss: Utilizing Unlabeled Data for Face Recognition
Figure 3 for Unknown Identity Rejection Loss: Utilizing Unlabeled Data for Face Recognition
Figure 4 for Unknown Identity Rejection Loss: Utilizing Unlabeled Data for Face Recognition

Face recognition has advanced considerably with the availability of large-scale labeled datasets. However, how to further improve the performance with the easily accessible unlabeled dataset remains a challenge. In this paper, we propose the novel Unknown Identity Rejection (UIR) loss to utilize the unlabeled data. We categorize identities in unconstrained environment into the known set and the unknown set. The former corresponds to the identities that appear in the labeled training dataset while the latter is its complementary set. Besides training the model to accurately classify the known identities, we also force the model to reject unknown identities provided by the unlabeled dataset via our proposed UIR loss. In order to 'reject' faces of unknown identities, centers of the known identities are forced to keep enough margin from centers of unknown identities which are assumed to be approximated by the features of their samples. By this means, the discriminativeness of the face representations can be enhanced. Experimental results demonstrate that our approach can provide obvious performance improvement by utilizing the unlabeled data.

* 8 pages, 2 figures, Workshop paper accepted by Lightweight Face Recognition Challenge & Workshop (ICCV 2019) 
Viaarxiv icon

iQIYI-VID: A Large Dataset for Multi-modal Person Identification

Nov 19, 2018
Yuanliu Liu, Peipei Shi, Bo Peng, He Yan, Yong Zhou, Bing Han, Yi Zheng, Chao Lin, Jianbin Jiang, Yin Fan, Tingwei Gao, Ganwen Wang, Jian Liu, Xiangju Lu, Danming Xie

Figure 1 for iQIYI-VID: A Large Dataset for Multi-modal Person Identification
Figure 2 for iQIYI-VID: A Large Dataset for Multi-modal Person Identification
Figure 3 for iQIYI-VID: A Large Dataset for Multi-modal Person Identification
Figure 4 for iQIYI-VID: A Large Dataset for Multi-modal Person Identification

Person identification in the wild is very challenging due to great variation in poses, face quality, clothes, makeup and so on. Traditional research, such as face recognition, person re-identification, and speaker recognition, often focuses on a single modal of information, which is inadequate to handle all the situations in practice. Multi-modal person identification is a more promising way that we can jointly utilize face, head, body, audio features, and so on. In this paper, we introduce iQIYI-VID, the largest video dataset for multi-modal person identification. It is composed of 600K video clips of 5,000 celebrities. These video clips are extracted from 400K hours of online videos of various types, ranging from movies, variety shows, TV series, to news broadcasting. All video clips pass through a careful human annotation process, and the error rate of labels is lower than 0.2%. We evaluated the state-of-art models of face recognition, person re-identification, and speaker recognition on the iQIYI-VID dataset. Experimental results show that these models are still far from being perfect for task of person identification in the wild. We further demonstrate that a simple fusion of multi-modal features can improve person identification considerably. We have released the dataset online to promote multi-modal person identification research.

Viaarxiv icon

Learning Latent Events from Network Message Logs: A Decomposition Based Approach

Apr 10, 2018
Siddhartha Satpathi, Supratim Deb, R Srikant, He Yan

Figure 1 for Learning Latent Events from Network Message Logs: A Decomposition Based Approach

In this communication, we describe a novel technique for event mining using a decomposition based approach that combines non-parametric change-point detection with LDA. We prove theoretical guarantees about sample-complexity and consistency of the approach. In a companion paper, we will perform a thorough evaluation of our approach with detailed experiments.

Viaarxiv icon