Alert button
Picture for Ying-Cong Chen

Ying-Cong Chen

Alert button

Adv3D: Generating 3D Adversarial Examples in Driving Scenarios with NeRF

Sep 04, 2023
Leheng Li, Qing Lian, Ying-Cong Chen

Deep neural networks (DNNs) have been proven extremely susceptible to adversarial examples, which raises special safety-critical concerns for DNN-based autonomous driving stacks (i.e., 3D object detection). Although there are extensive works on image-level attacks, most are restricted to 2D pixel spaces, and such attacks are not always physically realistic in our 3D world. Here we present Adv3D, the first exploration of modeling adversarial examples as Neural Radiance Fields (NeRFs). Advances in NeRF provide photorealistic appearances and 3D accurate generation, yielding a more realistic and realizable adversarial example. We train our adversarial NeRF by minimizing the surrounding objects' confidence predicted by 3D detectors on the training set. Then we evaluate Adv3D on the unseen validation set and show that it can cause a large performance reduction when rendering NeRF in any sampled pose. To generate physically realizable adversarial examples, we propose primitive-aware sampling and semantic-guided regularization that enable 3D patch attacks with camouflage adversarial texture. Experimental results demonstrate that the trained adversarial NeRF generalizes well to different poses, scenes, and 3D detectors. Finally, we provide a defense method to our attacks that involves adversarial training through data augmentation. Project page: https://len-li.github.io/adv3d-web

Viaarxiv icon

A Scale-Invariant Task Balancing Approach for Multi-Task Learning

Aug 23, 2023
Baijiong Lin, Weisen Jiang, Feiyang Ye, Yu Zhang, Pengguang Chen, Ying-Cong Chen, Shu Liu

Figure 1 for A Scale-Invariant Task Balancing Approach for Multi-Task Learning
Figure 2 for A Scale-Invariant Task Balancing Approach for Multi-Task Learning
Figure 3 for A Scale-Invariant Task Balancing Approach for Multi-Task Learning
Figure 4 for A Scale-Invariant Task Balancing Approach for Multi-Task Learning

Multi-task learning (MTL), a learning paradigm to learn multiple related tasks simultaneously, has achieved great success in various fields. However, task-balancing remains a significant challenge in MTL, with the disparity in loss/gradient scales often leading to performance compromises. In this paper, we propose a Scale-Invariant Multi-Task Learning (SI-MTL) method to alleviate the task-balancing problem from both loss and gradient perspectives. Specifically, SI-MTL contains a logarithm transformation which is performed on all task losses to ensure scale-invariant at the loss level, and a gradient balancing method, SI-G, which normalizes all task gradients to the same magnitude as the maximum gradient norm. Extensive experiments conducted on several benchmark datasets consistently demonstrate the effectiveness of SI-G and the state-of-the-art performance of SI-MTL.

* Technical Report 
Viaarxiv icon

High Dynamic Range Image Reconstruction via Deep Explicit Polynomial Curve Estimation

Jul 31, 2023
Jiaqi Tang, Xiaogang Xu, Sixing Hu, Ying-Cong Chen

Figure 1 for High Dynamic Range Image Reconstruction via Deep Explicit Polynomial Curve Estimation
Figure 2 for High Dynamic Range Image Reconstruction via Deep Explicit Polynomial Curve Estimation
Figure 3 for High Dynamic Range Image Reconstruction via Deep Explicit Polynomial Curve Estimation
Figure 4 for High Dynamic Range Image Reconstruction via Deep Explicit Polynomial Curve Estimation

Due to limited camera capacities, digital images usually have a narrower dynamic illumination range than real-world scene radiance. To resolve this problem, High Dynamic Range (HDR) reconstruction is proposed to recover the dynamic range to better represent real-world scenes. However, due to different physical imaging parameters, the tone-mapping functions between images and real radiance are highly diverse, which makes HDR reconstruction extremely challenging. Existing solutions can not explicitly clarify a corresponding relationship between the tone-mapping function and the generated HDR image, but this relationship is vital when guiding the reconstruction of HDR images. To address this problem, we propose a method to explicitly estimate the tone mapping function and its corresponding HDR image in one network. Firstly, based on the characteristics of the tone mapping function, we construct a model by a polynomial to describe the trend of the tone curve. To fit this curve, we use a learnable network to estimate the coefficients of the polynomial. This curve will be automatically adjusted according to the tone space of the Low Dynamic Range (LDR) image, and reconstruct the real HDR image. Besides, since all current datasets do not provide the corresponding relationship between the tone mapping function and the LDR image, we construct a new dataset with both synthetic and real images. Extensive experiments show that our method generalizes well under different tone-mapping functions and achieves SOTA performance.

Viaarxiv icon

Lift3D: Synthesize 3D Training Data by Lifting 2D GAN to 3D Generative Radiance Field

Apr 07, 2023
Leheng Li, Qing Lian, Luozhou Wang, Ningning Ma, Ying-Cong Chen

Figure 1 for Lift3D: Synthesize 3D Training Data by Lifting 2D GAN to 3D Generative Radiance Field
Figure 2 for Lift3D: Synthesize 3D Training Data by Lifting 2D GAN to 3D Generative Radiance Field
Figure 3 for Lift3D: Synthesize 3D Training Data by Lifting 2D GAN to 3D Generative Radiance Field
Figure 4 for Lift3D: Synthesize 3D Training Data by Lifting 2D GAN to 3D Generative Radiance Field

This work explores the use of 3D generative models to synthesize training data for 3D vision tasks. The key requirements of the generative models are that the generated data should be photorealistic to match the real-world scenarios, and the corresponding 3D attributes should be aligned with given sampling labels. However, we find that the recent NeRF-based 3D GANs hardly meet the above requirements due to their designed generation pipeline and the lack of explicit 3D supervision. In this work, we propose Lift3D, an inverted 2D-to-3D generation framework to achieve the data generation objectives. Lift3D has several merits compared to prior methods: (1) Unlike previous 3D GANs that the output resolution is fixed after training, Lift3D can generalize to any camera intrinsic with higher resolution and photorealistic output. (2) By lifting well-disentangled 2D GAN to 3D object NeRF, Lift3D provides explicit 3D information of generated objects, thus offering accurate 3D annotations for downstream tasks. We evaluate the effectiveness of our framework by augmenting autonomous driving datasets. Experimental results demonstrate that our data generation framework can effectively improve the performance of 3D object detectors. Project page: https://len-li.github.io/lift3d-web.

* CVPR 2023 
Viaarxiv icon

HyperThumbnail: Real-time 6K Image Rescaling with Rate-distortion Optimization

Apr 03, 2023
Chenyang Qi, Xin Yang, Ka Leong Cheng, Ying-Cong Chen, Qifeng Chen

Figure 1 for HyperThumbnail: Real-time 6K Image Rescaling with Rate-distortion Optimization
Figure 2 for HyperThumbnail: Real-time 6K Image Rescaling with Rate-distortion Optimization
Figure 3 for HyperThumbnail: Real-time 6K Image Rescaling with Rate-distortion Optimization
Figure 4 for HyperThumbnail: Real-time 6K Image Rescaling with Rate-distortion Optimization

Contemporary image rescaling aims at embedding a high-resolution (HR) image into a low-resolution (LR) thumbnail image that contains embedded information for HR image reconstruction. Unlike traditional image super-resolution, this enables high-fidelity HR image restoration faithful to the original one, given the embedded information in the LR thumbnail. However, state-of-the-art image rescaling methods do not optimize the LR image file size for efficient sharing and fall short of real-time performance for ultra-high-resolution (e.g., 6K) image reconstruction. To address these two challenges, we propose a novel framework (HyperThumbnail) for real-time 6K rate-distortion-aware image rescaling. Our framework first embeds an HR image into a JPEG LR thumbnail by an encoder with our proposed quantization prediction module, which minimizes the file size of the embedding LR JPEG thumbnail while maximizing HR reconstruction quality. Then, an efficient frequency-aware decoder reconstructs a high-fidelity HR image from the LR one in real time. Extensive experiments demonstrate that our framework outperforms previous image rescaling baselines in rate-distortion performance and can perform 6K image reconstruction in real time.

* Accepted by CVPR 2023; Github Repository: https://github.com/AbnerVictor/HyperThumbnail 
Viaarxiv icon

Ref-NeuS: Ambiguity-Reduced Neural Implicit Surface Learning for Multi-View Reconstruction with Reflection

Mar 20, 2023
Wenhang Ge, Tao Hu, Haoyu Zhao, Shu Liu, Ying-Cong Chen

Figure 1 for Ref-NeuS: Ambiguity-Reduced Neural Implicit Surface Learning for Multi-View Reconstruction with Reflection
Figure 2 for Ref-NeuS: Ambiguity-Reduced Neural Implicit Surface Learning for Multi-View Reconstruction with Reflection
Figure 3 for Ref-NeuS: Ambiguity-Reduced Neural Implicit Surface Learning for Multi-View Reconstruction with Reflection
Figure 4 for Ref-NeuS: Ambiguity-Reduced Neural Implicit Surface Learning for Multi-View Reconstruction with Reflection

Neural implicit surface learning has shown significant progress in multi-view 3D reconstruction, where an object is represented by multilayer perceptrons that provide continuous implicit surface representation and view-dependent radiance. However, current methods often fail to accurately reconstruct reflective surfaces, leading to severe ambiguity. To overcome this issue, we propose Ref-NeuS, which aims to reduce ambiguity by attenuating the importance of reflective surfaces. Specifically, we utilize an anomaly detector to estimate an explicit reflection score with the guidance of multi-view context to localize reflective surfaces. Afterward, we design a reflection-aware photometric loss that adaptively reduces ambiguity by modeling rendered color as a Gaussian distribution, with the reflection score representing the variance. We show that together with a reflection direction-dependent radiance, our model achieves high-quality surface reconstruction on reflective surfaces and outperforms the state-of-the-arts by a large margin. Besides, our model is also comparable on general surfaces.

* Project webpage: https://g3956.github.io/ 
Viaarxiv icon

Contactless Oxygen Monitoring with Gated Transformer

Dec 06, 2022
Hao He, Yuan Yuan, Ying-Cong Chen, Peng Cao, Dina Katabi

Figure 1 for Contactless Oxygen Monitoring with Gated Transformer
Figure 2 for Contactless Oxygen Monitoring with Gated Transformer
Figure 3 for Contactless Oxygen Monitoring with Gated Transformer
Figure 4 for Contactless Oxygen Monitoring with Gated Transformer

With the increasing popularity of telehealth, it becomes critical to ensure that basic physiological signals can be monitored accurately at home, with minimal patient overhead. In this paper, we propose a contactless approach for monitoring patients' blood oxygen at home, simply by analyzing the radio signals in the room, without any wearable devices. We extract the patients' respiration from the radio signals that bounce off their bodies and devise a novel neural network that infers a patient's oxygen estimates from their breathing signal. Our model, called \emph{Gated BERT-UNet}, is designed to adapt to the patient's medical indices (e.g., gender, sleep stages). It has multiple predictive heads and selects the most suitable head via a gate controlled by the person's physiological indices. Extensive empirical results show that our model achieves high accuracy on both medical and radio datasets.

* 19 pages, Workshop on Learning from Time Series for Health, NeurIPS 2022 
Viaarxiv icon

Adaptive Domain Generalization via Online Disagreement Minimization

Aug 03, 2022
Xin Zhang, Ying-Cong Chen

Figure 1 for Adaptive Domain Generalization via Online Disagreement Minimization
Figure 2 for Adaptive Domain Generalization via Online Disagreement Minimization
Figure 3 for Adaptive Domain Generalization via Online Disagreement Minimization
Figure 4 for Adaptive Domain Generalization via Online Disagreement Minimization

Deep neural networks suffer from significant performance deterioration when there exists distribution shift between deployment and training. Domain Generalization (DG) aims to safely transfer a model to unseen target domains by only relying on a set of source domains. Although various DG approaches have been proposed, a recent study named DomainBed, reveals that most of them do not beat the simple Empirical Risk Minimization (ERM). To this end, we propose a general framework that is orthogonal to existing DG algorithms and could improve their performance consistently. Unlike previous DG works that stake on a static source model to be hopefully a universal one, our proposed AdaODM adaptively modifies the source model at test time for different target domains. Specifically, we create multiple domain-specific classifiers upon a shared domain-generic feature extractor. The feature extractor and classifiers are trained in an adversarial way, where the feature extractor embeds the input samples into a domain-invariant space, and the multiple classifiers capture the distinct decision boundaries that each of them relates to a specific source domain. During testing, distribution differences between target and source domains could be effectively measured by leveraging prediction disagreement among source classifiers. By fine-tuning source models to minimize the disagreement at test time, target domain features are well aligned to the invariant feature space. We verify AdaODM on two popular DG methods, namely ERM and CORAL, and four DG benchmarks, namely VLCS, PACS, OfficeHome, and TerraIncognita. The results show AdaODM stably improves the generalization capacity on unseen domains and achieves state-of-the-art performance.

* 11 pages, 4 figures 
Viaarxiv icon

Representation Compensation Networks for Continual Semantic Segmentation

Mar 10, 2022
Chang-Bin Zhang, Jia-Wen Xiao, Xialei Liu, Ying-Cong Chen, Ming-Ming Cheng

Figure 1 for Representation Compensation Networks for Continual Semantic Segmentation
Figure 2 for Representation Compensation Networks for Continual Semantic Segmentation
Figure 3 for Representation Compensation Networks for Continual Semantic Segmentation
Figure 4 for Representation Compensation Networks for Continual Semantic Segmentation

In this work, we study the continual semantic segmentation problem, where the deep neural networks are required to incorporate new classes continually without catastrophic forgetting. We propose to use a structural re-parameterization mechanism, named representation compensation (RC) module, to decouple the representation learning of both old and new knowledge. The RC module consists of two dynamically evolved branches with one frozen and one trainable. Besides, we design a pooled cube knowledge distillation strategy on both spatial and channel dimensions to further enhance the plasticity and stability of the model. We conduct experiments on two challenging continual semantic segmentation scenarios, continual class segmentation and continual domain segmentation. Without any extra computational overhead and parameters during inference, our method outperforms state-of-the-art performance. The code is available at \url{https://github.com/zhangchbin/RCIL}.

* Accepted by CVPR 2022 
Viaarxiv icon

Image Synthesis via Semantic Composition

Sep 15, 2021
Yi Wang, Lu Qi, Ying-Cong Chen, Xiangyu Zhang, Jiaya Jia

Figure 1 for Image Synthesis via Semantic Composition
Figure 2 for Image Synthesis via Semantic Composition
Figure 3 for Image Synthesis via Semantic Composition
Figure 4 for Image Synthesis via Semantic Composition

In this paper, we present a novel approach to synthesize realistic images based on their semantic layouts. It hypothesizes that for objects with similar appearance, they share similar representation. Our method establishes dependencies between regions according to their appearance correlation, yielding both spatially variant and associated representations. Conditioning on these features, we propose a dynamic weighted network constructed by spatially conditional computation (with both convolution and normalization). More than preserving semantic distinctions, the given dynamic network strengthens semantic relevance, benefiting global structure and detail synthesis. We demonstrate that our method gives the compelling generation performance qualitatively and quantitatively with extensive experiments on benchmarks.

* Project page is at https://shepnerd.github.io/scg/. Accepted to ICCV 2021 
Viaarxiv icon