Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Ying Chen

FusionDepth: Complement Self-Supervised Monocular Depth Estimation with Cost Volume

May 10, 2023

Zhuofei Huang, Jianlin Liu, Shang Xu, Ying Chen, Yong Liu

Abstract:Multi-view stereo depth estimation based on cost volume usually works better than self-supervised monocular depth estimation except for moving objects and low-textured surfaces. So in this paper, we propose a multi-frame depth estimation framework which monocular depth can be refined continuously by multi-frame sequential constraints, leveraging a Bayesian fusion layer within several iterations. Both monocular and multi-view networks can be trained with no depth supervision. Our method also enhances the interpretability when combining monocular estimation with multi-view cost volume. Detailed experiments show that our method surpasses state-of-the-art unsupervised methods utilizing single or multiple frames at test time on KITTI benchmark.

Via

Access Paper or Ask Questions

MD-VQA: Multi-Dimensional Quality Assessment for UGC Live Videos

Apr 19, 2023

Zicheng Zhang, Wei Wu, Wei Sun, Dangyang Tu, Wei Lu, Xiongkuo Min, Ying Chen, Guangtao Zhai

Figure 1 for MD-VQA: Multi-Dimensional Quality Assessment for UGC Live Videos

Figure 2 for MD-VQA: Multi-Dimensional Quality Assessment for UGC Live Videos

Figure 3 for MD-VQA: Multi-Dimensional Quality Assessment for UGC Live Videos

Figure 4 for MD-VQA: Multi-Dimensional Quality Assessment for UGC Live Videos

Abstract:User-generated content (UGC) live videos are often bothered by various distortions during capture procedures and thus exhibit diverse visual qualities. Such source videos are further compressed and transcoded by media server providers before being distributed to end-users. Because of the flourishing of UGC live videos, effective video quality assessment (VQA) tools are needed to monitor and perceptually optimize live streaming videos in the distributing process. In this paper, we address \textbf{UGC Live VQA} problems by constructing a first-of-a-kind subjective UGC Live VQA database and developing an effective evaluation tool. Concretely, 418 source UGC videos are collected in real live streaming scenarios and 3,762 compressed ones at different bit rates are generated for the subsequent subjective VQA experiments. Based on the built database, we develop a \underline{M}ulti-\underline{D}imensional \underline{VQA} (\textbf{MD-VQA}) evaluator to measure the visual quality of UGC live videos from semantic, distortion, and motion aspects respectively. Extensive experimental results show that MD-VQA achieves state-of-the-art performance on both our UGC Live VQA database and existing compressed UGC VQA databases.

* Accepted to CVPR2023

Via

Access Paper or Ask Questions

EWT: Efficient Wavelet-Transformer for Single Image Denoising

Apr 13, 2023

Juncheng Li, Bodong Cheng, Ying Chen, Guangwei Gao, Tieyong Zeng

Abstract:Transformer-based image denoising methods have achieved encouraging results in the past year. However, it must uses linear operations to model long-range dependencies, which greatly increases model inference time and consumes GPU storage space. Compared with convolutional neural network-based methods, current Transformer-based image denoising methods cannot achieve a balance between performance improvement and resource consumption. In this paper, we propose an Efficient Wavelet Transformer (EWT) for image denoising. Specifically, we use Discrete Wavelet Transform (DWT) and Inverse Wavelet Transform (IWT) for downsampling and upsampling, respectively. This method can fully preserve the image features while reducing the image resolution, thereby greatly reducing the device resource consumption of the Transformer model. Furthermore, we propose a novel Dual-stream Feature Extraction Block (DFEB) to extract image features at different levels, which can further reduce model inference time and GPU memory usage. Experiments show that our method speeds up the original Transformer by more than 80%, reduces GPU memory usage by more than 60%, and achieves excellent denoising results. All code will be public.

* 12 pages, 11 figurs

Via

Access Paper or Ask Questions

TextShield: Beyond Successfully Detecting Adversarial Sentences in Text Classification

Feb 03, 2023

Lingfeng Shen, Ze Zhang, Haiyun Jiang, Ying Chen

Abstract:Adversarial attack serves as a major challenge for neural network models in NLP, which precludes the model's deployment in safety-critical applications. A recent line of work, detection-based defense, aims to distinguish adversarial sentences from benign ones. However, {the core limitation of previous detection methods is being incapable of giving correct predictions on adversarial sentences unlike defense methods from other paradigms.} To solve this issue, this paper proposes TextShield: (1) we discover a link between text attack and saliency information, and then we propose a saliency-based detector, which can effectively detect whether an input sentence is adversarial or not. (2) We design a saliency-based corrector, which converts the detected adversary sentences to benign ones. By combining the saliency-based detector and corrector, TextShield extends the detection-only paradigm to a detection-correction paradigm, thus filling the gap in the existing detection-based defense. Comprehensive experiments show that (a) TextShield consistently achieves higher or comparable performance than state-of-the-art defense methods across various attacks on different benchmarks. (b) our saliency-based detector outperforms existing detectors for detecting adversarial sentences.

Via

Access Paper or Ask Questions

Semi-Supervised Heterogeneous Graph Learning with Multi-level Data Augmentation

Nov 30, 2022

Ying Chen, Siwei Qiang, Mingming Ha, Xiaolei Liu, Shaoshuai Li, Lingfeng Yuan, Xiaobo Guo, Zhenfeng Zhu

Abstract:In recent years, semi-supervised graph learning with data augmentation (DA) is currently the most commonly used and best-performing method to enhance model robustness in sparse scenarios with few labeled samples. Differing from homogeneous graph, DA in heterogeneous graph has greater challenges: heterogeneity of information requires DA strategies to effectively handle heterogeneous relations, which considers the information contribution of different types of neighbors and edges to the target nodes. Furthermore, over-squashing of information is caused by the negative curvature that formed by the non-uniformity distribution and strong clustering in complex graph. To address these challenges, this paper presents a novel method named Semi-Supervised Heterogeneous Graph Learning with Multi-level Data Augmentation (HG-MDA). For the problem of heterogeneity of information in DA, node and topology augmentation strategies are proposed for the characteristics of heterogeneous graph. And meta-relation-based attention is applied as one of the indexes for selecting augmented nodes and edges. For the problem of over-squashing of information, triangle based edge adding and removing are designed to alleviate the negative curvature and bring the gain of topology. Finally, the loss function consists of the cross-entropy loss for labeled data and the consistency regularization for unlabeled data. In order to effectively fuse the prediction results of various DA strategies, the sharpening is used. Existing experiments on public datasets, i.e., ACM, DBLP, OGB, and industry dataset MB show that HG-MDA outperforms current SOTA models. Additionly, HG-MDA is applied to user identification in internet finance scenarios, helping the business to add 30% key users, and increase loans and balances by 3.6%, 11.1%, and 9.8%.

Via

Access Paper or Ask Questions

MFDNet: Towards Real-time Image Denoising On Mobile Devices

Nov 09, 2022

Zhuoqun Liu, Meiguang Jin, Ying Chen, Huaida Liu, Canqian Yang, Hongkai Xiong

Figure 1 for MFDNet: Towards Real-time Image Denoising On Mobile Devices

Figure 2 for MFDNet: Towards Real-time Image Denoising On Mobile Devices

Figure 3 for MFDNet: Towards Real-time Image Denoising On Mobile Devices

Figure 4 for MFDNet: Towards Real-time Image Denoising On Mobile Devices

Abstract:Deep convolutional neural networks have achieved great progress in image denoising tasks. However, their complicated architectures and heavy computational cost hinder their deployments on a mobile device. Some recent efforts in designing lightweight denoising networks focus on reducing either FLOPs (floating-point operations) or the number of parameters. However, these metrics are not directly correlated with the on-device latency. By performing extensive analysis and experiments, we identify the network architectures that can fully utilize powerful neural processing units (NPUs) and thus enjoy both low latency and excellent denoising performance. To this end, we propose a mobile-friendly denoising network, namely MFDNet. The experiments show that MFDNet achieves state-of-the-art performance on real-world denoising benchmarks SIDD and DND under real-time latency on mobile devices. The code and pre-trained models will be released.

* Under review at the 2023 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2023)

Via

Access Paper or Ask Questions

Probability-Dependent Gradient Decay in Large Margin Softmax

Oct 31, 2022

Siyuan Zhang, Linbo Xie, Ying Chen

Abstract:In the past few years, Softmax has become a common component in neural network frameworks. In this paper, a gradient decay hyperparameter is introduced in Softmax to control the probability-dependent gradient decay rate during training. By following the theoretical analysis and empirical results of a variety of model architectures trained on MNIST, CIFAR-10/100 and SVHN, we find that the generalization performance depends significantly on the gradient decay rate as the confidence probability rises, i.e., the gradient decreases convexly or concavely as the sample probability increases. Moreover, optimization with the small gradient decay shows a similar curriculum learning sequence where hard samples are in the spotlight only after easy samples are convinced sufficiently, and well-separated samples gain a higher gradient to reduce intra-class distance. Based on the analysis results, we can provide evidence that the large margin Softmax will affect the local Lipschitz constraint of the loss function by regulating the probability-dependent gradient decay rate. This paper provides a new perspective and understanding of the relationship among concepts of large margin Softmax, local Lipschitz constraint and curriculum learning by analyzing the gradient decay rate. Besides, we propose a warm-up strategy to dynamically adjust Softmax loss in training, where the gradient decay rate increases from over-small to speed up the convergence rate.

Via

Access Paper or Ask Questions

Rediscovery of Numerical Lüscher's Formula from the Neural Network

Oct 05, 2022

Yu Lu, Yi-Jia Wang, Ying Chen, Jia-Jun Wu

Figure 1 for Rediscovery of Numerical Lüscher's Formula from the Neural Network

Figure 2 for Rediscovery of Numerical Lüscher's Formula from the Neural Network

Figure 3 for Rediscovery of Numerical Lüscher's Formula from the Neural Network

Figure 4 for Rediscovery of Numerical Lüscher's Formula from the Neural Network

Abstract:We present that by predicting the spectrum in discrete space from the phase shift in continuous space, the neural network can remarkably reproduce the numerical L\"uscher's formula to a high precision. The model-independent property of the L\"uscher's formula is naturally realized by the generalizability of the neural network. This exhibits the great potential of the neural network to extract model-independent relation between model-dependent quantities, and this data-driven approach could greatly facilitate the discovery of the physical principles underneath the intricate data.

* 8 figures

Via

Access Paper or Ask Questions

Rethinking Dimensionality Reduction in Grid-based 3D Object Detection

Sep 24, 2022

Dihe Huang, Ying Chen, Yikang Ding, Jinli Liao, Jianlin Liu, Kai Wu, Qiang Nie, Yong Liu, Chengjie Wang

Figure 1 for Rethinking Dimensionality Reduction in Grid-based 3D Object Detection

Figure 2 for Rethinking Dimensionality Reduction in Grid-based 3D Object Detection

Figure 3 for Rethinking Dimensionality Reduction in Grid-based 3D Object Detection

Figure 4 for Rethinking Dimensionality Reduction in Grid-based 3D Object Detection

Abstract:Bird's eye view (BEV) is widely adopted by most of the current point cloud detectors due to the applicability of well-explored 2D detection techniques. However, existing methods obtain BEV features by simply collapsing voxel or point features along the height dimension, which causes the heavy loss of 3D spatial information. To alleviate the information loss, we propose a novel point cloud detection network based on a Multi-level feature dimensionality reduction strategy, called MDRNet. In MDRNet, the Spatial-aware Dimensionality Reduction (SDR) is designed to dynamically focus on the valuable parts of the object during voxel-to-BEV feature transformation. Furthermore, the Multi-level Spatial Residuals (MSR) is proposed to fuse the multi-level spatial information in the BEV feature maps. Extensive experiments on nuScenes show that the proposed method outperforms the state-of-the-art methods. The code will be available upon publication.

* Submitted to ICRA 2023

Via

Access Paper or Ask Questions

Scatter Points in Space: 3D Detection from Multi-view Monocular Images

Aug 31, 2022

Jianlin Liu, Zhuofei Huang, Dihe Huang, Shang Xu, Ying Chen, Yong Liu

Figure 1 for Scatter Points in Space: 3D Detection from Multi-view Monocular Images

Figure 2 for Scatter Points in Space: 3D Detection from Multi-view Monocular Images

Figure 3 for Scatter Points in Space: 3D Detection from Multi-view Monocular Images

Figure 4 for Scatter Points in Space: 3D Detection from Multi-view Monocular Images

Abstract:3D object detection from monocular image(s) is a challenging and long-standing problem of computer vision. To combine information from different perspectives without troublesome 2D instance tracking, recent methods tend to aggregate multiview feature by sampling regular 3D grid densely in space, which is inefficient. In this paper, we attempt to improve multi-view feature aggregation by proposing a learnable keypoints sampling method, which scatters pseudo surface points in 3D space, in order to keep data sparsity. The scattered points augmented by multi-view geometric constraints and visual features are then employed to infer objects location and shape in the scene. To make up the limitations of single frame and model multi-view geometry explicitly, we further propose a surface filter module for noise suppression. Experimental results show that our method achieves significantly better performance than previous works in terms of 3D detection (more than 0.1 AP improvement on some categories of ScanNet). The code will be publicly available.

Via

Access Paper or Ask Questions