Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Ping Tan

Streaming Radiance Fields for 3D Video Synthesis

Oct 26, 2022
Lingzhi Li, Zhen Shen, Zhongshu Wang, Li Shen, Ping Tan

Figure 1 for Streaming Radiance Fields for 3D Video Synthesis

Figure 2 for Streaming Radiance Fields for 3D Video Synthesis

Figure 3 for Streaming Radiance Fields for 3D Video Synthesis

Figure 4 for Streaming Radiance Fields for 3D Video Synthesis

We present an explicit-grid based method for efficiently reconstructing streaming radiance fields for novel view synthesis of real world dynamic scenes. Instead of training a single model that combines all the frames, we formulate the dynamic modeling problem with an incremental learning paradigm in which per-frame model difference is trained to complement the adaption of a base model on the current frame. By exploiting the simple yet effective tuning strategy with narrow bands, the proposed method realizes a feasible framework for handling video sequences on-the-fly with high training efficiency. The storage overhead induced by using explicit grid representations can be significantly reduced through the use of model difference based compression. We also introduce an efficient strategy to further accelerate model optimization for each frame. Experiments on challenging video sequences demonstrate that our approach is capable of achieving a training speed of 15 seconds per-frame with competitive rendering quality, which attains $1000 \times$ speedup over the state-of-the-art implicit methods. Code is available at https://github.com/AlgoHunt/StreamRF.

* Accepted at NeurIPS 2022

Via

Access Paper or Ask Questions

DART: Articulated Hand Model with Diverse Accessories and Rich Textures

Oct 14, 2022
Daiheng Gao, Yuliang Xiu, Kailin Li, Lixin Yang, Feng Wang, Peng Zhang, Bang Zhang, Cewu Lu, Ping Tan

Figure 1 for DART: Articulated Hand Model with Diverse Accessories and Rich Textures

Figure 2 for DART: Articulated Hand Model with Diverse Accessories and Rich Textures

Figure 3 for DART: Articulated Hand Model with Diverse Accessories and Rich Textures

Figure 4 for DART: Articulated Hand Model with Diverse Accessories and Rich Textures

Hand, the bearer of human productivity and intelligence, is receiving much attention due to the recent fever of digital twins. Among different hand morphable models, MANO has been widely used in vision and graphics community. However, MANO disregards textures and accessories, which largely limits its power to synthesize photorealistic hand data. In this paper, we extend MANO with Diverse Accessories and Rich Textures, namely DART. DART is composed of 50 daily 3D accessories which varies in appearance and shape, and 325 hand-crafted 2D texture maps covers different kinds of blemishes or make-ups. Unity GUI is also provided to generate synthetic hand data with user-defined settings, e.g., pose, camera, background, lighting, textures, and accessories. Finally, we release DARTset, which contains large-scale (800K), high-fidelity synthetic hand images, paired with perfect-aligned 3D labels. Experiments demonstrate its superiority in diversity. As a complement to existing hand datasets, DARTset boosts the generalization in both hand pose estimation and mesh recovery tasks. Raw ingredients (textures, accessories), Unity GUI, source code and DARTset are publicly available at dart2022.github.io

* Homepage: dart2022.github.io. Accepted by NeurIPS 2022 Datasets and Benchmarks Track

Via

Access Paper or Ask Questions

Domain Randomization-Enhanced Depth Simulation and Restoration for Perceiving and Grasping Specular and Transparent Objects

Aug 07, 2022
Qiyu Dai, Jiyao Zhang, Qiwei Li, Tianhao Wu, Hao Dong, Ziyuan Liu, Ping Tan, He Wang

Figure 1 for Domain Randomization-Enhanced Depth Simulation and Restoration for Perceiving and Grasping Specular and Transparent Objects

Figure 2 for Domain Randomization-Enhanced Depth Simulation and Restoration for Perceiving and Grasping Specular and Transparent Objects

Figure 3 for Domain Randomization-Enhanced Depth Simulation and Restoration for Perceiving and Grasping Specular and Transparent Objects

Figure 4 for Domain Randomization-Enhanced Depth Simulation and Restoration for Perceiving and Grasping Specular and Transparent Objects

Commercial depth sensors usually generate noisy and missing depths, especially on specular and transparent objects, which poses critical issues to downstream depth or point cloud-based tasks. To mitigate this problem, we propose a powerful RGBD fusion network, SwinDRNet, for depth restoration. We further propose Domain Randomization-Enhanced Depth Simulation (DREDS) approach to simulate an active stereo depth system using physically based rendering and generate a large-scale synthetic dataset that contains 130K photorealistic RGB images along with their simulated depths carrying realistic sensor noises. To evaluate depth restoration methods, we also curate a real-world dataset, namely STD, that captures 30 cluttered scenes composed of 50 objects with different materials from specular, transparent, to diffuse. Experiments demonstrate that the proposed DREDS dataset bridges the sim-to-real domain gap such that, trained on DREDS, our SwinDRNet can seamlessly generalize to other real depth datasets, e.g. ClearGrasp, and outperform the competing methods on depth restoration with a real-time speed. We further show that our depth restoration effectively boosts the performance of downstream tasks, including category-level pose estimation and grasping tasks. Our data and code are available at https://github.com/PKU-EPIC/DREDS

* ECCV 2022

Via

Access Paper or Ask Questions

RenderNet: Visual Relocalization Using Virtual Viewpoints in Large-Scale Indoor Environments

Jul 26, 2022
Jiahui Zhang, Shitao Tang, Kejie Qiu, Rui Huang, Chuan Fang, Le Cui, Zilong Dong, Siyu Zhu, Ping Tan

Figure 1 for RenderNet: Visual Relocalization Using Virtual Viewpoints in Large-Scale Indoor Environments

Figure 2 for RenderNet: Visual Relocalization Using Virtual Viewpoints in Large-Scale Indoor Environments

Figure 3 for RenderNet: Visual Relocalization Using Virtual Viewpoints in Large-Scale Indoor Environments

Figure 4 for RenderNet: Visual Relocalization Using Virtual Viewpoints in Large-Scale Indoor Environments

Visual relocalization has been a widely discussed problem in 3D vision: given a pre-constructed 3D visual map, the 6 DoF (Degrees-of-Freedom) pose of a query image is estimated. Relocalization in large-scale indoor environments enables attractive applications such as augmented reality and robot navigation. However, appearance changes fast in such environments when the camera moves, which is challenging for the relocalization system. To address this problem, we propose a virtual view synthesis-based approach, RenderNet, to enrich the database and refine poses regarding this particular scenario. Instead of rendering real images which requires high-quality 3D models, we opt to directly render the needed global and local features of virtual viewpoints and apply them in the subsequent image retrieval and feature matching operations respectively. The proposed method can largely improve the performance in large-scale indoor environments, e.g., achieving an improvement of 7.1\% and 12.2\% on the Inloc dataset.

Via

Access Paper or Ask Questions

Unseen Object 6D Pose Estimation: A Benchmark and Baselines

Jun 23, 2022
Minghao Gou, Haolin Pan, Hao-Shu Fang, Ziyuan Liu, Cewu Lu, Ping Tan

Figure 1 for Unseen Object 6D Pose Estimation: A Benchmark and Baselines

Figure 2 for Unseen Object 6D Pose Estimation: A Benchmark and Baselines

Figure 3 for Unseen Object 6D Pose Estimation: A Benchmark and Baselines

Figure 4 for Unseen Object 6D Pose Estimation: A Benchmark and Baselines

Estimating the 6D pose for unseen objects is in great demand for many real-world applications. However, current state-of-the-art pose estimation methods can only handle objects that are previously trained. In this paper, we propose a new task that enables and facilitates algorithms to estimate the 6D pose estimation of novel objects during testing. We collect a dataset with both real and synthetic images and up to 48 unseen objects in the test set. In the mean while, we propose a new metric named Infimum ADD (IADD) which is an invariant measurement for objects with different types of pose ambiguity. A two-stage baseline solution for this task is also provided. By training an end-to-end 3D correspondences network, our method finds corresponding points between an unseen object and a partial view RGBD image accurately and efficiently. It then calculates the 6D pose from the correspondences using an algorithm robust to object symmetry. Extensive experiments show that our method outperforms several intuitive baselines and thus verify its effectiveness. All the data, code and models will be made publicly available. Project page: www.graspnet.net/unseen6d

Via

Access Paper or Ask Questions

RCP: Recurrent Closest Point for Scene Flow Estimation on 3D Point Clouds

May 24, 2022
Xiaodong Gu, Chengzhou Tang, Weihao Yuan, Zuozhuo Dai, Siyu Zhu, Ping Tan

Figure 1 for RCP: Recurrent Closest Point for Scene Flow Estimation on 3D Point Clouds

Figure 2 for RCP: Recurrent Closest Point for Scene Flow Estimation on 3D Point Clouds

Figure 3 for RCP: Recurrent Closest Point for Scene Flow Estimation on 3D Point Clouds

Figure 4 for RCP: Recurrent Closest Point for Scene Flow Estimation on 3D Point Clouds

3D motion estimation including scene flow and point cloud registration has drawn increasing interest. Inspired by 2D flow estimation, recent methods employ deep neural networks to construct the cost volume for estimating accurate 3D flow. However, these methods are limited by the fact that it is difficult to define a search window on point clouds because of the irregular data structure. In this paper, we avoid this irregularity by a simple yet effective method.We decompose the problem into two interlaced stages, where the 3D flows are optimized point-wisely at the first stage and then globally regularized in a recurrent network at the second stage. Therefore, the recurrent network only receives the regular point-wise information as the input. In the experiments, we evaluate the proposed method on both the 3D scene flow estimation and the point cloud registration task. For 3D scene flow estimation, we make comparisons on the widely used FlyingThings3D and KITTIdatasets. For point cloud registration, we follow previous works and evaluate the data pairs with large pose and partially overlapping from ModelNet40. The results show that our method outperforms the previous method and achieves a new state-of-the-art performance on both 3D scene flow estimation and point cloud registration, which demonstrates the superiority of the proposed zero-order method on irregular point cloud data.

* Accepted to CVPR 2022

Via

Access Paper or Ask Questions

Efficient Virtual View Selection for 3D Hand Pose Estimation

Mar 29, 2022
Jian Cheng, Yanguang Wan, Dexin Zuo, Cuixia Ma, Jian Gu, Ping Tan, Hongan Wang, Xiaoming Deng, Yinda Zhang

Figure 1 for Efficient Virtual View Selection for 3D Hand Pose Estimation

Figure 2 for Efficient Virtual View Selection for 3D Hand Pose Estimation

Figure 3 for Efficient Virtual View Selection for 3D Hand Pose Estimation

Figure 4 for Efficient Virtual View Selection for 3D Hand Pose Estimation

3D hand pose estimation from single depth is a fundamental problem in computer vision, and has wide applications.However, the existing methods still can not achieve satisfactory hand pose estimation results due to view variation and occlusion of human hand. In this paper, we propose a new virtual view selection and fusion module for 3D hand pose estimation from single depth.We propose to automatically select multiple virtual viewpoints for pose estimation and fuse the results of all and find this empirically delivers accurate and robust pose estimation. In order to select most effective virtual views for pose fusion, we evaluate the virtual views based on the confidence of virtual views using a light-weight network via network distillation. Experiments on three main benchmark datasets including NYU, ICVL and Hands2019 demonstrate that our method outperforms the state-of-the-arts on NYU and ICVL, and achieves very competitive performance on Hands2019-Task1, and our proposed virtual view selection and fusion module is both effective for 3D hand pose estimation.

* Accepted by AAAI2022

Via

Access Paper or Ask Questions

AutoAvatar: Autoregressive Neural Fields for Dynamic Avatar Modeling

Mar 25, 2022
Ziqian Bai, Timur Bagautdinov, Javier Romero, Michael Zollhöfer, Ping Tan, Shunsuke Saito

Neural fields such as implicit surfaces have recently enabled avatar modeling from raw scans without explicit temporal correspondences. In this work, we exploit autoregressive modeling to further extend this notion to capture dynamic effects, such as soft-tissue deformations. Although autoregressive models are naturally capable of handling dynamics, it is non-trivial to apply them to implicit representations, as explicit state decoding is infeasible due to prohibitive memory requirements. In this work, for the first time, we enable autoregressive modeling of implicit avatars. To reduce the memory bottleneck and efficiently model dynamic implicit surfaces, we introduce the notion of articulated observer points, which relate implicit states to the explicit surface of a parametric human body model. We demonstrate that encoding implicit surfaces as a set of height fields defined on articulated observer points leads to significantly better generalization compared to a latent representation. The experiments show that our approach outperforms the state of the art, achieving plausible dynamic deformations even for unseen motions. https://zqbai-jeremy.github.io/autoavatar

* Project page: https://zqbai-jeremy.github.io/autoavatar

Via

Access Paper or Ask Questions

A Real World Dataset for Multi-view 3D Reconstruction

Mar 22, 2022
Rakesh Shrestha, Siqi Hu, Minghao Gou, Ziyuan Liu, Ping Tan

Figure 1 for A Real World Dataset for Multi-view 3D Reconstruction

Figure 2 for A Real World Dataset for Multi-view 3D Reconstruction

Figure 3 for A Real World Dataset for Multi-view 3D Reconstruction

Figure 4 for A Real World Dataset for Multi-view 3D Reconstruction

We present a dataset of 371 3D models of everyday tabletop objects along with their 320,000 real world RGB and depth images. Accurate annotations of camera poses and object poses for each image are performed in a semi-automated fashion to facilitate the use of the dataset for myriad 3D applications like shape reconstruction, object pose estimation, shape retrieval etc. We primarily focus on learned multi-view 3D reconstruction due to the lack of appropriate real world benchmark for the task and demonstrate that our dataset can fill that gap. The entire annotated dataset along with the source code for the annotation tools and evaluation baselines will be made publicly available.

Via

Access Paper or Ask Questions

NeW CRFs: Neural Window Fully-connected CRFs for Monocular Depth Estimation

Mar 03, 2022
Weihao Yuan, Xiaodong Gu, Zuozhuo Dai, Siyu Zhu, Ping Tan

Figure 1 for NeW CRFs: Neural Window Fully-connected CRFs for Monocular Depth Estimation

Figure 2 for NeW CRFs: Neural Window Fully-connected CRFs for Monocular Depth Estimation

Figure 3 for NeW CRFs: Neural Window Fully-connected CRFs for Monocular Depth Estimation

Figure 4 for NeW CRFs: Neural Window Fully-connected CRFs for Monocular Depth Estimation

Estimating the accurate depth from a single image is challenging since it is inherently ambiguous and ill-posed. While recent works design increasingly complicated and powerful networks to directly regress the depth map, we take the path of CRFs optimization. Due to the expensive computation, CRFs are usually performed between neighborhoods rather than the whole graph. To leverage the potential of fully-connected CRFs, we split the input into windows and perform the FC-CRFs optimization within each window, which reduces the computation complexity and makes FC-CRFs feasible. To better capture the relationships between nodes in the graph, we exploit the multi-head attention mechanism to compute a multi-head potential function, which is fed to the networks to output an optimized depth map. Then we build a bottom-up-top-down structure, where this neural window FC-CRFs module serves as the decoder, and a vision transformer serves as the encoder. The experiments demonstrate that our method significantly improves the performance across all metrics on both the KITTI and NYUv2 datasets, compared to previous methods. Furthermore, the proposed method can be directly applied to panorama images and outperforms all previous panorama methods on the MatterPort3D dataset. The source code of our method will be made public.

* Accepted by CVPR 2022

Via

Access Paper or Ask Questions