Alert button
Picture for Alexey Artemov

Alexey Artemov

Alert button

S4R: Self-Supervised Semantic Scene Reconstruction from RGB-D Scans

Feb 21, 2023
Junwen Huang, Alexey Artemov, Yujin Chen, Shuaifeng Zhi, Kai Xu, Matthias Nießner

Figure 1 for S4R: Self-Supervised Semantic Scene Reconstruction from RGB-D Scans
Figure 2 for S4R: Self-Supervised Semantic Scene Reconstruction from RGB-D Scans
Figure 3 for S4R: Self-Supervised Semantic Scene Reconstruction from RGB-D Scans
Figure 4 for S4R: Self-Supervised Semantic Scene Reconstruction from RGB-D Scans

Most deep learning approaches to comprehensive semantic modeling of 3D indoor spaces require costly dense annotations in the 3D domain. In this work, we explore a central 3D scene modeling task, namely, semantic scene reconstruction, using a fully self-supervised approach. To this end, we design a trainable model that employs both incomplete 3D reconstructions and their corresponding source RGB-D images, fusing cross-domain features into volumetric embeddings to predict complete 3D geometry, color, and semantics. Our key technical innovation is to leverage differentiable rendering of color and semantics, using the observed RGB images and a generic semantic segmentation model as color and semantics supervision, respectively. We additionally develop a method to synthesize an augmented set of virtual training views complementing the original real captures, enabling more efficient self-supervision for semantics. In this work we propose an end-to-end trainable solution jointly addressing geometry completion, colorization, and semantic mapping from a few RGB-D images, without 3D or 2D ground-truth. Our method is the first, to our knowledge, fully self-supervised method addressing completion and semantic segmentation of real-world 3D scans. It performs comparably well with the 3D supervised baselines, surpasses baselines with 2D supervision on real datasets, and generalizes well to unseen scenes.

Viaarxiv icon

Scan2Part: Fine-grained and Hierarchical Part-level Understanding of Real-World 3D Scans

Jun 06, 2022
Alexandr Notchenko, Vladislav Ishimtsev, Alexey Artemov, Vadim Selyutin, Emil Bogomolov, Evgeny Burnaev

Figure 1 for Scan2Part: Fine-grained and Hierarchical Part-level Understanding of Real-World 3D Scans
Figure 2 for Scan2Part: Fine-grained and Hierarchical Part-level Understanding of Real-World 3D Scans
Figure 3 for Scan2Part: Fine-grained and Hierarchical Part-level Understanding of Real-World 3D Scans
Figure 4 for Scan2Part: Fine-grained and Hierarchical Part-level Understanding of Real-World 3D Scans

We propose Scan2Part, a method to segment individual parts of objects in real-world, noisy indoor RGB-D scans. To this end, we vary the part hierarchies of objects in indoor scenes and explore their effect on scene understanding models. Specifically, we use a sparse U-Net-based architecture that captures the fine-scale detail of the underlying 3D scan geometry by leveraging a multi-scale feature hierarchy. In order to train our method, we introduce the Scan2Part dataset, which is the first large-scale collection providing detailed semantic labels at the part level in the real-world setting. In total, we provide 242,081 correspondences between 53,618 PartNet parts of 2,477 ShapeNet objects and 1,506 ScanNet scenes, at two spatial resolutions of 2 cm$^3$ and 5 cm$^3$. As output, we are able to predict fine-grained per-object part labels, even when the geometry is coarse or partially missing.

* In Proceedings of the 17th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications 
Viaarxiv icon

Multi-sensor large-scale dataset for multi-view 3D reconstruction

Mar 11, 2022
Oleg Voynov, Gleb Bobrovskikh, Pavel Karpyshev, Andrei-Timotei Ardelean, Arseniy Bozhenko, Saveliy Galochkin, Ekaterina Karmanova, Pavel Kopanev, Yaroslav Labutin-Rymsho, Ruslan Rakhimov, Aleksandr Safin, Valerii Serpiva, Alexey Artemov, Evgeny Burnaev, Dzmitry Tsetserukou, Denis Zorin

Figure 1 for Multi-sensor large-scale dataset for multi-view 3D reconstruction
Figure 2 for Multi-sensor large-scale dataset for multi-view 3D reconstruction
Figure 3 for Multi-sensor large-scale dataset for multi-view 3D reconstruction
Figure 4 for Multi-sensor large-scale dataset for multi-view 3D reconstruction

We present a new multi-sensor dataset for 3D surface reconstruction. It includes registered RGB and depth data from sensors of different resolutions and modalities: smartphones, Intel RealSense, Microsoft Kinect, industrial cameras, and structured-light scanner. The data for each scene is obtained under a large number of lighting conditions, and the scenes are selected to emphasize a diverse set of material properties challenging for existing algorithms. In the acquisition process, we aimed to maximize high-resolution depth data quality for challenging cases, to provide reliable ground truth for learning algorithms. Overall, we provide over 1.4 million images of 110 different scenes acquired at 14 lighting conditions from 100 viewing directions. We expect our dataset will be useful for evaluation and training of 3D reconstruction algorithms of different types and for other related tasks. Our dataset and accompanying software will be available online.

Viaarxiv icon

Can We Use Neural Regularization to Solve Depth Super-Resolution?

Dec 21, 2021
Milena Gazdieva, Oleg Voynov, Alexey Artemov, Youyi Zheng, Luiz Velho, Evgeny Burnaev

Figure 1 for Can We Use Neural Regularization to Solve Depth Super-Resolution?
Figure 2 for Can We Use Neural Regularization to Solve Depth Super-Resolution?
Figure 3 for Can We Use Neural Regularization to Solve Depth Super-Resolution?
Figure 4 for Can We Use Neural Regularization to Solve Depth Super-Resolution?

Depth maps captured with commodity sensors often require super-resolution to be used in applications. In this work we study a super-resolution approach based on a variational problem statement with Tikhonov regularization where the regularizer is parametrized with a deep neural network. This approach was previously applied successfully in photoacoustic tomography. We experimentally show that its application to depth map super-resolution is difficult, and provide suggestions about the reasons for that.

* 9 pages 
Viaarxiv icon

3D Parametric Wireframe Extraction Based on Distance Fields

Jul 13, 2021
Albert Matveev, Alexey Artemov, Denis Zorin, Evgeny Burnaev

Figure 1 for 3D Parametric Wireframe Extraction Based on Distance Fields
Figure 2 for 3D Parametric Wireframe Extraction Based on Distance Fields
Figure 3 for 3D Parametric Wireframe Extraction Based on Distance Fields
Figure 4 for 3D Parametric Wireframe Extraction Based on Distance Fields

We present a pipeline for parametric wireframe extraction from densely sampled point clouds. Our approach processes a scalar distance field that represents proximity to the nearest sharp feature curve. In intermediate stages, it detects corners, constructs curve segmentation, and builds a topological graph fitted to the wireframe. As an output, we produce parametric spline curves that can be edited and sampled arbitrarily. We evaluate our method on 50 complex 3D shapes and compare it to the novel deep learning-based technique, demonstrating superior quality.

Viaarxiv icon

Towards Unpaired Depth Enhancement and Super-Resolution in the Wild

May 25, 2021
Aleksandr Safin, Maxim Kan, Nikita Drobyshev, Oleg Voynov, Alexey Artemov, Alexander Filippov, Denis Zorin, Evgeny Burnaev

Figure 1 for Towards Unpaired Depth Enhancement and Super-Resolution in the Wild
Figure 2 for Towards Unpaired Depth Enhancement and Super-Resolution in the Wild
Figure 3 for Towards Unpaired Depth Enhancement and Super-Resolution in the Wild
Figure 4 for Towards Unpaired Depth Enhancement and Super-Resolution in the Wild

Depth maps captured with commodity sensors are often of low quality and resolution; these maps need to be enhanced to be used in many applications. State-of-the-art data-driven methods of depth map super-resolution rely on registered pairs of low- and high-resolution depth maps of the same scenes. Acquisition of real-world paired data requires specialized setups. Another alternative, generating low-resolution maps from high-resolution maps by subsampling, adding noise and other artificial degradation methods, does not fully capture the characteristics of real-world low-resolution images. As a consequence, supervised learning methods trained on such artificial paired data may not perform well on real-world low-resolution inputs. We consider an approach to depth map enhancement based on learning from unpaired data. While many techniques for unpaired image-to-image translation have been proposed, most are not directly applicable to depth maps. We propose an unpaired learning method for simultaneous depth enhancement and super-resolution, which is based on a learnable degradation model and surface normal estimates as features to produce more accurate depth maps. We demonstrate that our method outperforms existing unpaired methods and performs on par with paired methods on a new benchmark for unpaired learning that we developed.

Viaarxiv icon

Towards Part-Based Understanding of RGB-D Scans

Dec 03, 2020
Alexey Bokhovkin, Vladislav Ishimtsev, Emil Bogomolov, Denis Zorin, Alexey Artemov, Evgeny Burnaev, Angela Dai

Figure 1 for Towards Part-Based Understanding of RGB-D Scans
Figure 2 for Towards Part-Based Understanding of RGB-D Scans
Figure 3 for Towards Part-Based Understanding of RGB-D Scans
Figure 4 for Towards Part-Based Understanding of RGB-D Scans

Recent advances in 3D semantic scene understanding have shown impressive progress in 3D instance segmentation, enabling object-level reasoning about 3D scenes; however, a finer-grained understanding is required to enable interactions with objects and their functional understanding. Thus, we propose the task of part-based scene understanding of real-world 3D environments: from an RGB-D scan of a scene, we detect objects, and for each object predict its decomposition into geometric part masks, which composed together form the complete geometry of the observed object. We leverage an intermediary part graph representation to enable robust completion as well as building of part priors, which we use to construct the final part mask predictions. Our experiments demonstrate that guiding part understanding through part graph to part prior-based predictions significantly outperforms alternative approaches to the task of semantic part completion.

* https://youtu.be/iuixmPNs4v4 
Viaarxiv icon

DEF: Deep Estimation of Sharp Geometric Features in 3D Shapes

Nov 30, 2020
Albert Matveev, Alexey Artemov, Ruslan Rakhimov, Gleb Bobrovskikh, Daniele Panozzo, Denis Zorin, Evgeny Burnaev

Figure 1 for DEF: Deep Estimation of Sharp Geometric Features in 3D Shapes
Figure 2 for DEF: Deep Estimation of Sharp Geometric Features in 3D Shapes
Figure 3 for DEF: Deep Estimation of Sharp Geometric Features in 3D Shapes
Figure 4 for DEF: Deep Estimation of Sharp Geometric Features in 3D Shapes

Sharp feature lines carry essential information about human-made objects, enabling compact 3D shape representations, high-quality surface reconstruction, and are a signal source for mesh processing. While extracting high-quality lines from noisy and undersampled data is challenging for traditional methods, deep learning-powered algorithms can leverage global and semantic information from the training data to aid in the process. We propose Deep Estimators of Features (DEFs), a learning-based framework for predicting sharp geometric features in sampled 3D shapes. Differently from existing data-driven methods, which reduce this problem to feature classification, we propose to regress a scalar field representing the distance from point samples to the closest feature line on local patches. By fusing the result of individual patches, we can process large 3D models, which are impossible to process for existing data-driven methods due to their size and complexity. Extensive experimental evaluation of DEFs is implemented on synthetic and real-world 3D shape datasets and suggests advantages of our image- and point-based estimators over competitor methods, as well as improved noise robustness and scalability of our approach.

Viaarxiv icon

CAD-Deform: Deformable Fitting of CAD Models to 3D Scans

Jul 23, 2020
Vladislav Ishimtsev, Alexey Bokhovkin, Alexey Artemov, Savva Ignatyev, Matthias Niessner, Denis Zorin, Evgeny Burnaev

Figure 1 for CAD-Deform: Deformable Fitting of CAD Models to 3D Scans
Figure 2 for CAD-Deform: Deformable Fitting of CAD Models to 3D Scans
Figure 3 for CAD-Deform: Deformable Fitting of CAD Models to 3D Scans
Figure 4 for CAD-Deform: Deformable Fitting of CAD Models to 3D Scans

Shape retrieval and alignment are a promising avenue towards turning 3D scans into lightweight CAD representations that can be used for content creation such as mobile or AR/VR gaming scenarios. Unfortunately, CAD model retrieval is limited by the availability of models in standard 3D shape collections (e.g., ShapeNet). In this work, we address this shortcoming by introducing CAD-Deform, a method which obtains more accurate CAD-to-scan fits by non-rigidly deforming retrieved CAD models. Our key contribution is a new non-rigid deformation model incorporating smooth transformations and preservation of sharp features, that simultaneously achieves very tight fits from CAD models to the 3D scan and maintains the clean, high-quality surface properties of hand-modeled CAD objects. A series of thorough experiments demonstrate that our method achieves significantly tighter scan-to-CAD fits, allowing a more accurate digital replica of the scanned real-world environment while preserving important geometric features present in synthetic CAD environments.

* 25 pages, 13 figures, ECCV 2020 
Viaarxiv icon

Geometric Attention for Prediction of Differential Properties in 3D Point Clouds

Jul 16, 2020
Albert Matveev, Alexey Artemov, Denis Zorin, Evgeny Burnaev

Figure 1 for Geometric Attention for Prediction of Differential Properties in 3D Point Clouds
Figure 2 for Geometric Attention for Prediction of Differential Properties in 3D Point Clouds
Figure 3 for Geometric Attention for Prediction of Differential Properties in 3D Point Clouds
Figure 4 for Geometric Attention for Prediction of Differential Properties in 3D Point Clouds

Estimation of differential geometric quantities in discrete 3D data representations is one of the crucial steps in the geometry processing pipeline. Specifically, estimating normals and sharp feature lines from raw point cloud helps improve meshing quality and allows us to use more precise surface reconstruction techniques. When designing a learnable approach to such problems, the main difficulty is selecting neighborhoods in a point cloud and incorporating geometric relations between the points. In this study, we present a geometric attention mechanism that can provide such properties in a learnable fashion. We establish the usefulness of the proposed technique with several experiments on the prediction of normal vectors and the extraction of feature lines.

Viaarxiv icon