Alert button
Picture for Yuanqi Chen

Yuanqi Chen

Alert button

Class Prototype-based Cleaner for Label Noise Learning

Dec 21, 2022
Jingjia Huang, Yuanqi Chen, Jiashi Feng, Xinglong Wu

Figure 1 for Class Prototype-based Cleaner for Label Noise Learning
Figure 2 for Class Prototype-based Cleaner for Label Noise Learning
Figure 3 for Class Prototype-based Cleaner for Label Noise Learning
Figure 4 for Class Prototype-based Cleaner for Label Noise Learning

Semi-supervised learning based methods are current SOTA solutions to the noisy-label learning problem, which rely on learning an unsupervised label cleaner first to divide the training samples into a labeled set for clean data and an unlabeled set for noise data. Typically, the cleaner is obtained via fitting a mixture model to the distribution of per-sample training losses. However, the modeling procedure is \emph{class agnostic} and assumes the loss distributions of clean and noise samples are the same across different classes. Unfortunately, in practice, such an assumption does not always hold due to the varying learning difficulty of different classes, thus leading to sub-optimal label noise partition criteria. In this work, we reveal this long-ignored problem and propose a simple yet effective solution, named \textbf{C}lass \textbf{P}rototype-based label noise \textbf{C}leaner (\textbf{CPC}). Unlike previous works treating all the classes equally, CPC fully considers loss distribution heterogeneity and applies class-aware modulation to partition the clean and noise data. CPC takes advantage of loss distribution modeling and intra-class consistency regularization in feature space simultaneously and thus can better distinguish clean and noise labels. We theoretically justify the effectiveness of our method by explaining it from the Expectation-Maximization (EM) framework. Extensive experiments are conducted on the noisy-label benchmarks CIFAR-10, CIFAR-100, Clothing1M and WebVision. The results show that CPC consistently brings about performance improvement across all benchmarks. Codes and pre-trained models will be released at \url{https://github.com/hjjpku/CPC.git}.

Viaarxiv icon

SKFlow: Learning Optical Flow with Super Kernels

May 29, 2022
Shangkun Sun, Yuanqi Chen, Yu Zhu, Guodong Guo, Ge Li

Figure 1 for SKFlow: Learning Optical Flow with Super Kernels
Figure 2 for SKFlow: Learning Optical Flow with Super Kernels
Figure 3 for SKFlow: Learning Optical Flow with Super Kernels
Figure 4 for SKFlow: Learning Optical Flow with Super Kernels

Optical flow estimation is a classical yet challenging task in computer vision. One of the essential factors in accurately predicting optical flow is to alleviate occlusions between frames. However, it is still a thorny problem for current top-performing optical flow estimation methods due to insufficient local evidence to model occluded areas. In this paper, we propose Super Kernel Flow Network (SKFlow), a CNN architecture to ameliorate the impacts of occlusions on optical flow estimation. SKFlow benefits from the super kernels which bring enlarged receptive fields to complement the absent matching information and recover the occluded motions. We present efficient super kernel designs by utilizing conical connections and hybrid depth-wise convolutions. Extensive experiments demonstrate the effectiveness of SKFlow on multiple benchmarks, especially in the occluded areas. Without pre-trained backbones on ImageNet and with modest increase in computation, SKFlow achieves compelling performance and ranks $\textbf{1st}$ among current published methods on Sintel benchmark. On the challenging Sintel final pass test set, SKFlow attains the average end-point error of $2.23$, which surpasses the best published result $2.47$ by $9.72\%$.

Viaarxiv icon

PIRenderer: Controllable Portrait Image Generation via Semantic Neural Rendering

Sep 17, 2021
Yurui Ren, Ge Li, Yuanqi Chen, Thomas H. Li, Shan Liu

Figure 1 for PIRenderer: Controllable Portrait Image Generation via Semantic Neural Rendering
Figure 2 for PIRenderer: Controllable Portrait Image Generation via Semantic Neural Rendering
Figure 3 for PIRenderer: Controllable Portrait Image Generation via Semantic Neural Rendering
Figure 4 for PIRenderer: Controllable Portrait Image Generation via Semantic Neural Rendering

Generating portrait images by controlling the motions of existing faces is an important task of great consequence to social media industries. For easy use and intuitive control, semantically meaningful and fully disentangled parameters should be used as modifications. However, many existing techniques do not provide such fine-grained controls or use indirect editing methods i.e. mimic motions of other individuals. In this paper, a Portrait Image Neural Renderer (PIRenderer) is proposed to control the face motions with the parameters of three-dimensional morphable face models (3DMMs). The proposed model can generate photo-realistic portrait images with accurate movements according to intuitive modifications. Experiments on both direct and indirect editing tasks demonstrate the superiority of this model. Meanwhile, we further extend this model to tackle the audio-driven facial reenactment task by extracting sequential motions from audio inputs. We show that our model can generate coherent videos with convincing movements from only a single reference image and a driving audio stream. Our source code is available at https://github.com/RenYurui/PIRender.

Viaarxiv icon

Large-Scale Spatio-Temporal Person Re-identification: Algorithm and Benchmark

Jun 24, 2021
Xiujun Shu, Xiao Wang, Xianghao Zang, Shiliang Zhang, Yuanqi Chen, Ge Li, Qi Tian

Figure 1 for Large-Scale Spatio-Temporal Person Re-identification: Algorithm and Benchmark
Figure 2 for Large-Scale Spatio-Temporal Person Re-identification: Algorithm and Benchmark
Figure 3 for Large-Scale Spatio-Temporal Person Re-identification: Algorithm and Benchmark
Figure 4 for Large-Scale Spatio-Temporal Person Re-identification: Algorithm and Benchmark

Person re-identification (re-ID) in the scenario with large spatial and temporal spans has not been fully explored. This is partially because that, existing benchmark datasets were mainly collected with limited spatial and temporal ranges, e.g., using videos recorded in a few days by cameras in a specific region of the campus. Such limited spatial and temporal ranges make it hard to simulate the difficulties of person re-ID in real scenarios. In this work, we contribute a novel Large-scale Spatio-Temporal LaST person re-ID dataset, including 10,862 identities with more than 228k images. Compared with existing datasets, LaST presents more challenging and high-diversity re-ID settings, and significantly larger spatial and temporal ranges. For instance, each person can appear in different cities or countries, and in various time slots from daytime to night, and in different seasons from spring to winter. To our best knowledge, LaST is a novel person re-ID dataset with the largest spatio-temporal ranges. Based on LaST, we verified its challenge by conducting a comprehensive performance evaluation of 14 re-ID algorithms. We further propose an easy-to-implement baseline that works well on such challenging re-ID setting. We also verified that models pre-trained on LaST can generalize well on existing datasets with short-term and cloth-changing scenarios. We expect LaST to inspire future works toward more realistic and challenging re-ID tasks. More information about the dataset is available at https://github.com/shuxjweb/last.git.

Viaarxiv icon

Low Pass Filter for Anti-aliasing in Temporal Action Localization

Apr 23, 2021
Cece Jin, Yuanqi Chen, Ge Li, Tao Zhang, Thomas Li

Figure 1 for Low Pass Filter for Anti-aliasing in Temporal Action Localization
Figure 2 for Low Pass Filter for Anti-aliasing in Temporal Action Localization
Figure 3 for Low Pass Filter for Anti-aliasing in Temporal Action Localization
Figure 4 for Low Pass Filter for Anti-aliasing in Temporal Action Localization

In temporal action localization methods, temporal downsampling operations are widely used to extract proposal features, but they often lead to the aliasing problem, due to lacking consideration of sampling rates. This paper aims to verify the existence of aliasing in TAL methods and investigate utilizing low pass filters to solve this problem by inhibiting the high-frequency band. However, the high-frequency band usually contains large amounts of specific information, which is important for model inference. Therefore, it is necessary to make a tradeoff between anti-aliasing and reserving high-frequency information. To acquire optimal performance, this paper learns different cutoff frequencies for different instances dynamically. This design can be plugged into most existing temporal modeling programs requiring only one additional cutoff frequency parameter. Integrating low pass filters to the downsampling operations significantly improves the detection performance and achieves comparable results on THUMOS'14, ActivityNet~1.3, and Charades datasets. Experiments demonstrate that anti-aliasing with low pass filters in TAL is advantageous and efficient.

Viaarxiv icon

SSD-GAN: Measuring the Realness in the Spatial and Spectral Domains

Dec 15, 2020
Yuanqi Chen, Ge Li, Cece Jin, Shan Liu, Thomas Li

Figure 1 for SSD-GAN: Measuring the Realness in the Spatial and Spectral Domains
Figure 2 for SSD-GAN: Measuring the Realness in the Spatial and Spectral Domains
Figure 3 for SSD-GAN: Measuring the Realness in the Spatial and Spectral Domains
Figure 4 for SSD-GAN: Measuring the Realness in the Spatial and Spectral Domains

This paper observes that there is an issue of high frequencies missing in the discriminator of standard GAN, and we reveal it stems from downsampling layers employed in the network architecture. This issue makes the generator lack the incentive from the discriminator to learn high-frequency content of data, resulting in a significant spectrum discrepancy between generated images and real images. Since the Fourier transform is a bijective mapping, we argue that reducing this spectrum discrepancy would boost the performance of GANs. To this end, we introduce SSD-GAN, an enhancement of GANs to alleviate the spectral information loss in the discriminator. Specifically, we propose to embed a frequency-aware classifier into the discriminator to measure the realness of the input in both the spatial and spectral domains. With the enhanced discriminator, the generator of SSD-GAN is encouraged to learn high-frequency content of real data and generate exact details. The proposed method is general and can be easily integrated into most existing GANs framework without excessive cost. The effectiveness of SSD-GAN is validated on various network architectures, objective functions, and datasets. Code will be available at https://github.com/cyq373/SSD-GAN.

* Accepted to AAAI 2021. Code: https://github.com/cyq373/SSD-GAN 
Viaarxiv icon