Alert button
Picture for Tianchen Zhao

Tianchen Zhao

Alert button

Ada3D : Exploiting the Spatial Redundancy with Adaptive Inference for Efficient 3D Object Detection

Jul 17, 2023
Tianchen Zhao, Xuefei Ning, Ke Hong, Zhongyuan Qiu, Pu Lu, Yali Zhao, Linfeng Zhang, Lipu Zhou, Guohao Dai, Huazhong Yang, Yu Wang

Figure 1 for Ada3D : Exploiting the Spatial Redundancy with Adaptive Inference for Efficient 3D Object Detection
Figure 2 for Ada3D : Exploiting the Spatial Redundancy with Adaptive Inference for Efficient 3D Object Detection
Figure 3 for Ada3D : Exploiting the Spatial Redundancy with Adaptive Inference for Efficient 3D Object Detection
Figure 4 for Ada3D : Exploiting the Spatial Redundancy with Adaptive Inference for Efficient 3D Object Detection

Voxel-based methods have achieved state-of-the-art performance for 3D object detection in autonomous driving. However, their significant computational and memory costs pose a challenge for their application to resource-constrained vehicles. One reason for this high resource consumption is the presence of a large number of redundant background points in Lidar point clouds, resulting in spatial redundancy in both 3D voxel and dense BEV map representations. To address this issue, we propose an adaptive inference framework called Ada3D, which focuses on exploiting the input-level spatial redundancy. Ada3D adaptively filters the redundant input, guided by a lightweight importance predictor and the unique properties of the Lidar point cloud. Additionally, we utilize the BEV features' intrinsic sparsity by introducing the Sparsity Preserving Batch Normalization. With Ada3D, we achieve 40% reduction for 3D voxels and decrease the density of 2D BEV feature maps from 100% to 20% without sacrificing accuracy. Ada3D reduces the model computational and memory cost by 5x, and achieves 1.52x/1.45x end-to-end GPU latency and 1.5x/4.5x GPU peak memory optimization for the 3D and 2D backbone respectively.

* Accepted at ICCV2023 
Viaarxiv icon

Dynamic Ensemble of Low-fidelity Experts: Mitigating NAS "Cold-Start"

Feb 02, 2023
Junbo Zhao, Xuefei Ning, Enshu Liu, Binxin Ru, Zixuan Zhou, Tianchen Zhao, Chen Chen, Jiajin Zhang, Qingmin Liao, Yu Wang

Figure 1 for Dynamic Ensemble of Low-fidelity Experts: Mitigating NAS "Cold-Start"
Figure 2 for Dynamic Ensemble of Low-fidelity Experts: Mitigating NAS "Cold-Start"
Figure 3 for Dynamic Ensemble of Low-fidelity Experts: Mitigating NAS "Cold-Start"
Figure 4 for Dynamic Ensemble of Low-fidelity Experts: Mitigating NAS "Cold-Start"

Predictor-based Neural Architecture Search (NAS) employs an architecture performance predictor to improve the sample efficiency. However, predictor-based NAS suffers from the severe ``cold-start'' problem, since a large amount of architecture-performance data is required to get a working predictor. In this paper, we focus on exploiting information in cheaper-to-obtain performance estimations (i.e., low-fidelity information) to mitigate the large data requirements of predictor training. Despite the intuitiveness of this idea, we observe that using inappropriate low-fidelity information even damages the prediction ability and different search spaces have different preferences for low-fidelity information types. To solve the problem and better fuse beneficial information provided by different types of low-fidelity information, we propose a novel dynamic ensemble predictor framework that comprises two steps. In the first step, we train different sub-predictors on different types of available low-fidelity information to extract beneficial knowledge as low-fidelity experts. In the second step, we learn a gating network to dynamically output a set of weighting coefficients conditioned on each input neural architecture, which will be used to combine the predictions of different low-fidelity experts in a weighted sum. The overall predictor is optimized on a small set of actual architecture-performance data to fuse the knowledge from different low-fidelity experts to make the final prediction. We conduct extensive experiments across five search spaces with different architecture encoders under various experimental settings. Our method can easily be incorporated into existing predictor-based NAS frameworks to discover better architectures.

Viaarxiv icon

Scalable neural quantum states architecture for quantum chemistry

Aug 11, 2022
Tianchen Zhao, James Stokes, Shravan Veerapaneni

Figure 1 for Scalable neural quantum states architecture for quantum chemistry
Figure 2 for Scalable neural quantum states architecture for quantum chemistry
Figure 3 for Scalable neural quantum states architecture for quantum chemistry
Figure 4 for Scalable neural quantum states architecture for quantum chemistry

Variational optimization of neural-network representations of quantum states has been successfully applied to solve interacting fermionic problems. Despite rapid developments, significant scalability challenges arise when considering molecules of large scale, which correspond to non-locally interacting quantum spin Hamiltonians consisting of sums of thousands or even millions of Pauli operators. In this work, we introduce scalable parallelization strategies to improve neural-network-based variational quantum Monte Carlo calculations for ab-initio quantum chemistry applications. We establish GPU-supported local energy parallelism to compute the optimization objective for Hamiltonians of potentially complex molecules. Using autoregressive sampling techniques, we demonstrate systematic improvement in wall-clock timings required to achieve CCSD baseline target energies. The performance is further enhanced by accommodating the structure of resultant spin Hamiltonians into the autoregressive sampling ordering. The algorithm achieves promising performance in comparison with the classical approximate methods and exhibits both running time and scalability advantages over existing neural-network based methods.

* 11 pages, 4 figures 
Viaarxiv icon

CodedVTR: Codebook-based Sparse Voxel Transformer with Geometric Guidance

Mar 27, 2022
Tianchen Zhao, Niansong Zhang, Xuefei Ning, He Wang, Li Yi, Yu Wang

Figure 1 for CodedVTR: Codebook-based Sparse Voxel Transformer with Geometric Guidance
Figure 2 for CodedVTR: Codebook-based Sparse Voxel Transformer with Geometric Guidance
Figure 3 for CodedVTR: Codebook-based Sparse Voxel Transformer with Geometric Guidance
Figure 4 for CodedVTR: Codebook-based Sparse Voxel Transformer with Geometric Guidance

Transformers have gained much attention by outperforming convolutional neural networks in many 2D vision tasks. However, they are known to have generalization problems and rely on massive-scale pre-training and sophisticated training techniques. When applying to 3D tasks, the irregular data structure and limited data scale add to the difficulty of transformer's application. We propose CodedVTR (Codebook-based Voxel TRansformer), which improves data efficiency and generalization ability for 3D sparse voxel transformers. On the one hand, we propose the codebook-based attention that projects an attention space into its subspace represented by the combination of "prototypes" in a learnable codebook. It regularizes attention learning and improves generalization. On the other hand, we propose geometry-aware self-attention that utilizes geometric information (geometric pattern, density) to guide attention learning. CodedVTR could be embedded into existing sparse convolution-based methods, and bring consistent performance improvements for indoor and outdoor 3D semantic segmentation tasks

* Published at CVPR2022 
Viaarxiv icon

Multi-shot NAS for Discovering Adversarially Robust Convolutional Neural Architectures at Targeted Capacities

Jan 01, 2021
Xuefei Ning, Junbo Zhao, Wenshuo Li, Tianchen Zhao, Huazhong Yang, Yu Wang

Figure 1 for Multi-shot NAS for Discovering Adversarially Robust Convolutional Neural Architectures at Targeted Capacities
Figure 2 for Multi-shot NAS for Discovering Adversarially Robust Convolutional Neural Architectures at Targeted Capacities
Figure 3 for Multi-shot NAS for Discovering Adversarially Robust Convolutional Neural Architectures at Targeted Capacities
Figure 4 for Multi-shot NAS for Discovering Adversarially Robust Convolutional Neural Architectures at Targeted Capacities

Convolutional neural networks (CNNs) are vulnerable to adversarial examples, and studies show that increasing the model capacity of an architecture topology (e.g., width expansion) can bring consistent robustness improvements. This reveals a clear robustness-efficiency trade-off that should be considered in architecture design. Recent studies have employed one-shot neural architecture search (NAS) to discover adversarially robust architectures. However, since the capacities of different topologies cannot be easily aligned during the search process, current one-shot NAS methods might favor topologies with larger capacity in the supernet. And the discovered topology might be sub-optimal when aligned to the targeted capacity. This paper proposes a novel multi-shot NAS method to explicitly search for adversarially robust architectures at a certain targeted capacity. Specifically, we estimate the reward at the targeted capacity using interior extra-polation of the rewards from multiple supernets. Experimental results demonstrate the effectiveness of the proposed method. For instance, at the targeted FLOPs of 1560M, the discovered MSRobNet-1560 (clean 84.8%, PGD100 52.9%) outperforms the recent NAS-discovered architecture RobNet-free (clean 82.8%, PGD100 52.6%) with similar FLOPs. Codes are available at https://github.com/walkerning/aw_nas.

* 9 pages, 8 pages appendices 
Viaarxiv icon

BARS: Joint Search of Cell Topology and Layout for Accurate and Efficient Binary ARchitectures

Dec 21, 2020
Tianchen Zhao, Xuefei Ning, Songyi Yang, Shuang Liang, Peng Lei, Jianfei Chen, Huazhong Yang, Yu Wang

Figure 1 for BARS: Joint Search of Cell Topology and Layout for Accurate and Efficient Binary ARchitectures
Figure 2 for BARS: Joint Search of Cell Topology and Layout for Accurate and Efficient Binary ARchitectures
Figure 3 for BARS: Joint Search of Cell Topology and Layout for Accurate and Efficient Binary ARchitectures
Figure 4 for BARS: Joint Search of Cell Topology and Layout for Accurate and Efficient Binary ARchitectures

Binary Neural Networks (BNNs) have received significant attention due to their promising efficiency. Currently, most BNN studies directly adopt widely-used CNN architectures, which can be suboptimal for BNNs. This paper proposes a novel Binary ARchitecture Search (BARS) flow to discover superior binary architecture in a large design space. Specifically, we design a two-level (Macro & Micro) search space tailored for BNNs and apply a differentiable neural architecture search (NAS) to explore this search space efficiently. The macro-level search space includes depth and width decisions, which is required for better balancing the model performance and capacity. And we also make modifications to the micro-level search space to strengthen the information flow for BNN. A notable challenge of BNN architecture search lies in that binary operations exacerbate the "collapse" problem of differentiable NAS, and we incorporate various search and derive strategies to stabilize the search process. On CIFAR-10, BARS achieves 1.5% higher accuracy with 2/3 binary Ops and $1/10$ floating-point Ops. On ImageNet, with similar resource consumption, BARS-discovered architecture achieves 3% accuracy gain than hand-crafted architectures, while removing the full-precision downsample layer.

Viaarxiv icon

Learning to Recognize Patch-Wise Consistency for Deepfake Detection

Dec 16, 2020
Tianchen Zhao, Xiang Xu, Mingze Xu, Hui Ding, Yuanjun Xiong, Wei Xia

Figure 1 for Learning to Recognize Patch-Wise Consistency for Deepfake Detection
Figure 2 for Learning to Recognize Patch-Wise Consistency for Deepfake Detection
Figure 3 for Learning to Recognize Patch-Wise Consistency for Deepfake Detection
Figure 4 for Learning to Recognize Patch-Wise Consistency for Deepfake Detection

We propose to detect Deepfake generated by face manipulation based on one of their fundamental features: images are blended by patches from multiple sources, carrying distinct and persistent source features. In particular, we propose a novel representation learning approach for this task, called patch-wise consistency learning (PCL). It learns by measuring the consistency of image source features, resulting to representation with good interpretability and robustness to multiple forgery methods. We develop an inconsistency image generator (I2G) to generate training data for PCL and boost its robustness. We evaluate our approach on seven popular Deepfake detection datasets. Our model achieves superior detection accuracy and generalizes well to unseen generation methods. On average, our model outperforms the state-of-the-art in terms of AUC by 2% and 8% in the in- and cross-dataset evaluation, respectively.

* 13 pages, 7 figures 
Viaarxiv icon

aw_nas: A Modularized and Extensible NAS framework

Nov 25, 2020
Xuefei Ning, Changcheng Tang, Wenshuo Li, Songyi Yang, Tianchen Zhao, Niansong Zhang, Tianyi Lu, Shuang Liang, Huazhong Yang, Yu Wang

Figure 1 for aw_nas: A Modularized and Extensible NAS framework
Figure 2 for aw_nas: A Modularized and Extensible NAS framework
Figure 3 for aw_nas: A Modularized and Extensible NAS framework
Figure 4 for aw_nas: A Modularized and Extensible NAS framework

Neural Architecture Search (NAS) has received extensive attention due to its capability to discover neural network architectures in an automated manner. aw_nas is an open-source Python framework implementing various NAS algorithms in a modularized manner. Currently, aw_nas can be used to reproduce the results of mainstream NAS algorithms of various types. Also, due to the modularized design, one can simply experiment with different NAS algorithms for various applications with awnas (e.g., classification, detection, text modeling, fault tolerance, adversarial robustness, hardware efficiency, and etc.). Codes and documentation are available at https://github.com/walkerning/aw_nas.

Viaarxiv icon