Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Siyu Chen

Massachusetts Institute of Technology USA

Video Temporal Relationship Mining for Data-Efficient Person Re-identification

Oct 01, 2021
Siyu Chen, Dengjie Li, Lishuai Gao, Fan Liang, Wei Zhang, Lin Ma

Figure 1 for Video Temporal Relationship Mining for Data-Efficient Person Re-identification

Figure 2 for Video Temporal Relationship Mining for Data-Efficient Person Re-identification

Figure 3 for Video Temporal Relationship Mining for Data-Efficient Person Re-identification

Figure 4 for Video Temporal Relationship Mining for Data-Efficient Person Re-identification

This paper is a technical report to our submission to the ICCV 2021 VIPriors Re-identification Challenge. In order to make full use of the visual inductive priors of the data, we treat the query and gallery images of the same identity as continuous frames in a video sequence. And we propose one novel post-processing strategy for video temporal relationship mining, which not only calculates the distance matrix between query and gallery images, but also the matrix between gallery images. The initial query image is used to retrieve the most similar image from the gallery, then the retrieved image is treated as a new query to retrieve its most similar image from the gallery. By iteratively searching for the closest image, we can achieve accurate image retrieval and finally obtain a robust retrieval sequence.

Via

Access Paper or Ask Questions

ForgeryNet: A Versatile Benchmark for Comprehensive Forgery Analysis

Mar 09, 2021
Yinan He, Bei Gan, Siyu Chen, Yichun Zhou, Guojun Yin, Luchuan Song, Lu Sheng, Jing Shao, Ziwei Liu

Figure 1 for ForgeryNet: A Versatile Benchmark for Comprehensive Forgery Analysis

Figure 2 for ForgeryNet: A Versatile Benchmark for Comprehensive Forgery Analysis

Figure 3 for ForgeryNet: A Versatile Benchmark for Comprehensive Forgery Analysis

Figure 4 for ForgeryNet: A Versatile Benchmark for Comprehensive Forgery Analysis

The rapid progress of photorealistic synthesis techniques has reached at a critical point where the boundary between real and manipulated images starts to blur. Thus, benchmarking and advancing digital forgery analysis have become a pressing issue. However, existing face forgery datasets either have limited diversity or only support coarse-grained analysis. To counter this emerging threat, we construct the ForgeryNet dataset, an extremely large face forgery dataset with unified annotations in image- and video-level data across four tasks: 1) Image Forgery Classification, including two-way (real / fake), three-way (real / fake with identity-replaced forgery approaches / fake with identity-remained forgery approaches), and n-way (real and 15 respective forgery approaches) classification. 2) Spatial Forgery Localization, which segments the manipulated area of fake images compared to their corresponding source real images. 3) Video Forgery Classification, which re-defines the video-level forgery classification with manipulated frames in random positions. This task is important because attackers in real world are free to manipulate any target frame. and 4) Temporal Forgery Localization, to localize the temporal segments which are manipulated. ForgeryNet is by far the largest publicly available deep face forgery dataset in terms of data-scale (2.9 million images, 221,247 videos), manipulations (7 image-level approaches, 8 video-level approaches), perturbations (36 independent and more mixed perturbations) and annotations (6.3 million classification labels, 2.9 million manipulated area annotations and 221,247 temporal forgery segment labels). We perform extensive benchmarking and studies of existing face forensics methods and obtain several valuable observations.

* 17 pages, 11 figures, Accepted to CVPR 2021 (Oral), project webpage: https://yinanhe.github.io/projects/forgerynet.html

Via

Access Paper or Ask Questions

Maximum a posteriori signal recovery for optical coherence tomography angiography image generation and denoising

Oct 29, 2020
Lennart Husvogt, Stefan B. Ploner, Siyu Chen, Daniel Stromer, Julia Schottenhamml, A. Yasin Alibhai, Eric Moult, Nadia K. Waheed, James G. Fujimoto, Andreas Maier

Figure 1 for Maximum a posteriori signal recovery for optical coherence tomography angiography image generation and denoising

Figure 2 for Maximum a posteriori signal recovery for optical coherence tomography angiography image generation and denoising

Figure 3 for Maximum a posteriori signal recovery for optical coherence tomography angiography image generation and denoising

Figure 4 for Maximum a posteriori signal recovery for optical coherence tomography angiography image generation and denoising

Optical coherence tomography angiography (OCTA) is a novel and clinically promising imaging modality to image retinal and sub-retinal vasculature. Based on repeated optical coherence tomography (OCT) scans, intensity changes are observed over time and used to compute OCTA image data. OCTA data are prone to noise and artifacts caused by variations in flow speed and patient movement. We propose a novel iterative maximum a posteriori signal recovery algorithm in order to generate OCTA volumes with reduced noise and increased image quality. This algorithm is based on previous work on probabilistic OCTA signal models and maximum likelihood estimates. Reconstruction results using total variation minimization and wavelet shrinkage for regularization were compared against an OCTA ground truth volume, merged from six co-registered single OCTA volumes. The results show a significant improvement in peak signal-to-noise ratio and structural similarity. The presented algorithm brings together OCTA image generation and Bayesian statistics and can be developed into new OCTA image generation and denoising algorithms.

* 14 pages, 4 figures, to be published in Biomedical Optics Express

Via

Access Paper or Ask Questions

RANDOM MASK: Towards Robust Convolutional Neural Networks

Jul 27, 2020
Tiange Luo, Tianle Cai, Mengxiao Zhang, Siyu Chen, Liwei Wang

Figure 1 for RANDOM MASK: Towards Robust Convolutional Neural Networks

Figure 2 for RANDOM MASK: Towards Robust Convolutional Neural Networks

Figure 3 for RANDOM MASK: Towards Robust Convolutional Neural Networks

Figure 4 for RANDOM MASK: Towards Robust Convolutional Neural Networks

Robustness of neural networks has recently been highlighted by the adversarial examples, i.e., inputs added with well-designed perturbations which are imperceptible to humans but can cause the network to give incorrect outputs. In this paper, we design a new CNN architecture that by itself has good robustness. We introduce a simple but powerful technique, Random Mask, to modify existing CNN structures. We show that CNN with Random Mask achieves state-of-the-art performance against black-box adversarial attacks without applying any adversarial training. We next investigate the adversarial examples which 'fool' a CNN with Random Mask. Surprisingly, we find that these adversarial examples often 'fool' humans as well. This raises fundamental questions on how to define adversarial examples and robustness properly.

* arXiv admin note: substantial text overlap with arXiv:1911.08432

Via

Access Paper or Ask Questions

Actor-Context-Actor Relation Network for Spatio-Temporal Action Localization

Jul 15, 2020
Junting Pan, Siyu Chen, Zheng Shou, Jing Shao, Hongsheng Li

Figure 1 for Actor-Context-Actor Relation Network for Spatio-Temporal Action Localization

Figure 2 for Actor-Context-Actor Relation Network for Spatio-Temporal Action Localization

Figure 3 for Actor-Context-Actor Relation Network for Spatio-Temporal Action Localization

Figure 4 for Actor-Context-Actor Relation Network for Spatio-Temporal Action Localization

Localizing persons and recognizing their actions from videos is a challenging task towards high-level video understanding. Recent advances have been achieved by modeling either 'actor-actor' or 'actor-context' relations. However, such direct first-order relations are not sufficient for localizing actions in complicated scenes. Some actors might be indirectly related via objects or background context in the scene. Such indirect relations are crucial for determining the action labels but are mostly ignored by existing work. In this paper, we propose to explicitly model the Actor-Context-Actor Relation, which can capture indirect high-order supportive information for effectively reasoning actors' actions in complex scenes. To this end, we design an Actor-Context-Actor Relation Network (ACAR-Net) which builds upon a novel High-order Relation Reasoning Operator to model indirect relations for spatio-temporal action localization. Moreover, to allow utilizing more temporal contexts, we extend our framework with an Actor-Context Feature Bank for reasoning long-range high-order relations. Extensive experiments on AVA dataset validate the effectiveness of our ACAR-Net. Ablation studies show the advantages of modeling high-order relations over existing first-order relation reasoning methods. The proposed ACAR-Net is also the core module of our 1st place solution in AVA-Kinetics Crossover Challenge 2020. Training code and models will be available at https://github.com/Siyu-C/ACAR-Net.

* 1st place solution in ActivityNet Challenge 2020 -- AVA-Kinetics Task

Via

Access Paper or Ask Questions

1st place solution for AVA-Kinetics Crossover in AcitivityNet Challenge 2020

Jun 16, 2020
Siyu Chen, Junting Pan, Guanglu Song, Manyuan Zhang, Hao Shao, Ziyi Lin, Jing Shao, Hongsheng Li, Yu Liu

Figure 1 for 1st place solution for AVA-Kinetics Crossover in AcitivityNet Challenge 2020

Figure 2 for 1st place solution for AVA-Kinetics Crossover in AcitivityNet Challenge 2020

Figure 3 for 1st place solution for AVA-Kinetics Crossover in AcitivityNet Challenge 2020

Figure 4 for 1st place solution for AVA-Kinetics Crossover in AcitivityNet Challenge 2020

This technical report introduces our winning solution to the spatio-temporal action localization track, AVA-Kinetics Crossover, in ActivityNet Challenge 2020. Our entry is mainly based on Actor-Context-Actor Relation Network. We describe technical details for the new AVA-Kinetics dataset, together with some experimental results. Without any bells and whistles, we achieved 39.62 mAP on the test set of AVA-Kinetics, which outperforms other entries by a large margin. Code will be available at: https://github.com/Siyu-C/ACAR-Net.

* arXiv admin note: substantial text overlap with arXiv:2006.07976

Via

Access Paper or Ask Questions

Defective Convolutional Layers Learn Robust CNNs

Nov 19, 2019
Tiange Luo, Tianle Cai, Mengxiao Zhang, Siyu Chen, Di He, Liwei Wang

Figure 1 for Defective Convolutional Layers Learn Robust CNNs

Figure 2 for Defective Convolutional Layers Learn Robust CNNs

Figure 3 for Defective Convolutional Layers Learn Robust CNNs

Figure 4 for Defective Convolutional Layers Learn Robust CNNs

Robustness of convolutional neural networks has recently been highlighted by the adversarial examples, i.e., inputs added with well-designed perturbations which are imperceptible to humans but can cause the network to give incorrect outputs. Recent research suggests that the noises in adversarial examples break the textural structure, which eventually leads to wrong predictions by convolutional neural networks. To help a convolutional neural network make predictions relying less on textural information, we propose defective convolutional layers which contain defective neurons whose activations are set to be a constant function. As the defective neurons contain no information and are far different from the standard neurons in its spatial neighborhood, the textural features cannot be accurately extracted and the model has to seek for other features for classification, such as the shape. We first show that predictions made by the defective CNN are less dependent on textural information, but more on shape information, and further find that adversarial examples generated by the defective CNN appear to have semantic shapes. Experimental results demonstrate the defective CNN has higher defense ability than the standard CNN against various types of attack. In particular, it achieves state-of-the-art performance against transfer-based attacks without applying any adversarial training.

Via

Access Paper or Ask Questions

A Gram-Gauss-Newton Method Learning Overparameterized Deep Neural Networks for Regression Problems

May 28, 2019
Tianle Cai, Ruiqi Gao, Jikai Hou, Siyu Chen, Dong Wang, Di He, Zhihua Zhang, Liwei Wang

Figure 1 for A Gram-Gauss-Newton Method Learning Overparameterized Deep Neural Networks for Regression Problems

Figure 2 for A Gram-Gauss-Newton Method Learning Overparameterized Deep Neural Networks for Regression Problems

First-order methods such as stochastic gradient descent (SGD) are currently the standard algorithm for training deep neural networks. Second-order methods, despite their better convergence rate, are rarely used in practice due to the prohibitive computational cost in calculating the second order information. In this paper, we propose a novel Gram-Gauss-Newton (GGN) algorithm to train deep neural networks for regression problems with square loss. Different from typical second-order methods that have heavy computational cost in each iteration, our proposed GGN only has minor overhead compared to first-order methods such as SGD. We also provide theoretical results to show that for sufficiently wide neural networks, the convergence rate of the GGN algorithm is quadratic. Preliminary experiments on regression tasks demonstrate that for training standard networks, the GGN algorithm converges faster and achieves better performance than SGD.

* Submitted to NeurIPS 2019

Via

Access Paper or Ask Questions

A Deep Optimization Approach for Image Deconvolution

Apr 16, 2019
Zhijian Luo, Siyu Chen, Yuntao Qian

Figure 1 for A Deep Optimization Approach for Image Deconvolution

Figure 2 for A Deep Optimization Approach for Image Deconvolution

Figure 3 for A Deep Optimization Approach for Image Deconvolution

Figure 4 for A Deep Optimization Approach for Image Deconvolution

In blind image deconvolution, priors are often leveraged to constrain the solution space, so as to alleviate the under-determinacy. Priors which are trained separately from the task of deconvolution tend to be instable, or ineffective. We propose the Golf Optimizer, a novel but simple form of network that learns deep priors from data with better propagation behavior. Like playing golf, our method first estimates an aggressive propagation towards optimum using one network, and recurrently applies a residual CNN to learn the gradient of prior for delicate correction on restoration. Experiments show that our network achieves competitive performance on GoPro dataset, and our model is extremely lightweight compared with the state-of-art works.

* 12 pages, 16 figures

Via

Access Paper or Ask Questions