Alert button
Picture for Matheus Gadelha

Matheus Gadelha

Alert button

3DMiner: Discovering Shapes from Large-Scale Unannotated Image Datasets

Oct 29, 2023
Ta-Ying Cheng, Matheus Gadelha, Soren Pirk, Thibault Groueix, Radomir Mech, Andrew Markham, Niki Trigoni

We present 3DMiner -- a pipeline for mining 3D shapes from challenging large-scale unannotated image datasets. Unlike other unsupervised 3D reconstruction methods, we assume that, within a large-enough dataset, there must exist images of objects with similar shapes but varying backgrounds, textures, and viewpoints. Our approach leverages the recent advances in learning self-supervised image representations to cluster images with geometrically similar shapes and find common image correspondences between them. We then exploit these correspondences to obtain rough camera estimates as initialization for bundle-adjustment. Finally, for every image cluster, we apply a progressive bundle-adjusting reconstruction method to learn a neural occupancy field representing the underlying shape. We show that this procedure is robust to several types of errors introduced in previous steps (e.g., wrong camera poses, images containing dissimilar shapes, etc.), allowing us to obtain shape and pose annotations for images in-the-wild. When using images from Pix3D chairs, our method is capable of producing significantly better results than state-of-the-art unsupervised 3D reconstruction techniques, both quantitatively and qualitatively. Furthermore, we show how 3DMiner can be applied to in-the-wild data by reconstructing shapes present in images from the LAION-5B dataset. Project Page: https://ttchengab.github.io/3dminerOfficial

* In ICCV 2023 
Viaarxiv icon

Accidental Turntables: Learning 3D Pose by Watching Objects Turn

Dec 13, 2022
Zezhou Cheng, Matheus Gadelha, Subhransu Maji

Figure 1 for Accidental Turntables: Learning 3D Pose by Watching Objects Turn
Figure 2 for Accidental Turntables: Learning 3D Pose by Watching Objects Turn
Figure 3 for Accidental Turntables: Learning 3D Pose by Watching Objects Turn
Figure 4 for Accidental Turntables: Learning 3D Pose by Watching Objects Turn

We propose a technique for learning single-view 3D object pose estimation models by utilizing a new source of data -- in-the-wild videos where objects turn. Such videos are prevalent in practice (e.g., cars in roundabouts, airplanes near runways) and easy to collect. We show that classical structure-from-motion algorithms, coupled with the recent advances in instance detection and feature matching, provides surprisingly accurate relative 3D pose estimation on such videos. We propose a multi-stage training scheme that first learns a canonical pose across a collection of videos and then supervises a model for single-view pose estimation. The proposed technique achieves competitive performance with respect to existing state-of-the-art on standard benchmarks for 3D pose estimation, without requiring any pose labels during training. We also contribute an Accidental Turntables Dataset, containing a challenging set of 41,212 images of cars in cluttered backgrounds, motion blur and illumination changes that serves as a benchmark for 3D pose estimation.

* Project website: https://people.cs.umass.edu/~zezhoucheng/acci-turn/ 
Viaarxiv icon

Leveraging Monocular Disparity Estimation for Single-View Reconstruction

Jul 01, 2022
Marissa Ramirez de Chanlatte, Matheus Gadelha, Thibault Groueix, Radomir Mech

Figure 1 for Leveraging Monocular Disparity Estimation for Single-View Reconstruction
Figure 2 for Leveraging Monocular Disparity Estimation for Single-View Reconstruction
Figure 3 for Leveraging Monocular Disparity Estimation for Single-View Reconstruction
Figure 4 for Leveraging Monocular Disparity Estimation for Single-View Reconstruction

We present a fine-tuning method to improve the appearance of 3D geometries reconstructed from single images. We leverage advances in monocular depth estimation to obtain disparity maps and present a novel approach to transforming 2D normalized disparity maps into 3D point clouds by solving an optimization on the relevant camera parameters, After creating a 3D point cloud from disparity, we introduce a method to combine the new point cloud with existing information to form a more faithful and detailed final geometry. We demonstrate the efficacy of our approach with multiple experiments on both synthetic and real images.

Viaarxiv icon

PlanarRecon: Real-time 3D Plane Detection and Reconstruction from Posed Monocular Videos

Jun 15, 2022
Yiming Xie, Matheus Gadelha, Fengting Yang, Xiaowei Zhou, Huaizu Jiang

Figure 1 for PlanarRecon: Real-time 3D Plane Detection and Reconstruction from Posed Monocular Videos
Figure 2 for PlanarRecon: Real-time 3D Plane Detection and Reconstruction from Posed Monocular Videos
Figure 3 for PlanarRecon: Real-time 3D Plane Detection and Reconstruction from Posed Monocular Videos
Figure 4 for PlanarRecon: Real-time 3D Plane Detection and Reconstruction from Posed Monocular Videos

We present PlanarRecon -- a novel framework for globally coherent detection and reconstruction of 3D planes from a posed monocular video. Unlike previous works that detect planes in 2D from a single image, PlanarRecon incrementally detects planes in 3D for each video fragment, which consists of a set of key frames, from a volumetric representation of the scene using neural networks. A learning-based tracking and fusion module is designed to merge planes from previous fragments to form a coherent global plane reconstruction. Such design allows PlanarRecon to integrate observations from multiple views within each fragment and temporal information across different ones, resulting in an accurate and coherent reconstruction of the scene abstraction with low-polygonal geometry. Experiments show that the proposed approach achieves state-of-the-art performances on the ScanNet dataset while being real-time.

* CVPR 2022. Project page: https://neu-vi.github.io/planarrecon/ 
Viaarxiv icon

ANISE: Assembly-based Neural Implicit Surface rEconstruction

May 27, 2022
Dmitry Petrov, Matheus Gadelha, Radomir Mech, Evangelos Kalogerakis

Figure 1 for ANISE: Assembly-based Neural Implicit Surface rEconstruction
Figure 2 for ANISE: Assembly-based Neural Implicit Surface rEconstruction
Figure 3 for ANISE: Assembly-based Neural Implicit Surface rEconstruction
Figure 4 for ANISE: Assembly-based Neural Implicit Surface rEconstruction

We present ANISE, a method that reconstructs a 3D shape from partial observations (images or sparse point clouds) using a part-aware neural implicit shape representation. It is formulated as an assembly of neural implicit functions, each representing a different shape part. In contrast to previous approaches, the prediction of this representation proceeds in a coarse-to-fine manner. Our network first predicts part transformations which are associated with part neural implicit functions conditioned on those transformations. The part implicit functions can then be combined into a single, coherent shape, enabling part-aware shape reconstructions from images and point clouds. Those reconstructions can be obtained in two ways: (i) by directly decoding combining the refined part implicit functions; or (ii) by using part latents to query similar parts in a part database and assembling them in a single shape. We demonstrate that, when performing reconstruction by decoding part representations into implicit functions, our method achieves state-of-the-art part-aware reconstruction results from both images and sparse point clouds. When reconstructing shapes by assembling parts queried from a dataset, our approach significantly outperforms traditional shape retrieval methods even when significantly restricting the size of the shape database. We present our results in well-known sparse point cloud reconstruction and single-view reconstruction benchmarks.

* 8 pages, 5 figures, 4 tables 
Viaarxiv icon

SurFit: Learning to Fit Surfaces Improves Few Shot Learning on Point Clouds

Dec 27, 2021
Gopal Sharma, Bidya Dash, Matheus Gadelha, Aruni RoyChowdhury, Marios Loizou, Evangelos Kalogerakis, Liangliang Cao, Erik Learned-Miller, Rui Wang andSubhransu Maji

Figure 1 for SurFit: Learning to Fit Surfaces Improves Few Shot Learning on Point Clouds
Figure 2 for SurFit: Learning to Fit Surfaces Improves Few Shot Learning on Point Clouds
Figure 3 for SurFit: Learning to Fit Surfaces Improves Few Shot Learning on Point Clouds
Figure 4 for SurFit: Learning to Fit Surfaces Improves Few Shot Learning on Point Clouds

We present SurFit, a simple approach for label efficient learning of 3D shape segmentation networks. SurFit is based on a self-supervised task of decomposing the surface of a 3D shape into geometric primitives. It can be readily applied to existing network architectures for 3D shape segmentation and improves their performance in the few-shot setting, as we demonstrate in the widely used ShapeNet and PartNet benchmarks. SurFit outperforms the prior state-of-the-art in this setting, suggesting that decomposability into primitives is a useful prior for learning representations predictive of semantic parts. We present a number of experiments varying the choice of geometric primitives and downstream tasks to demonstrate the effectiveness of the method.

Viaarxiv icon

Deep Manifold Prior

Apr 08, 2020
Matheus Gadelha, Rui Wang, Subhransu Maji

Figure 1 for Deep Manifold Prior
Figure 2 for Deep Manifold Prior
Figure 3 for Deep Manifold Prior
Figure 4 for Deep Manifold Prior

We present a prior for manifold structured data, such as surfaces of 3D shapes, where deep neural networks are adopted to reconstruct a target shape using gradient descent starting from a random initialization. We show that surfaces generated this way are smooth, with limiting behavior characterized by Gaussian processes, and we mathematically derive such properties for fully-connected as well as convolutional networks. We demonstrate our method in a variety of manifold reconstruction applications, such as point cloud denoising and interpolation, achieving considerably better results against competitive baselines while requiring no training data. We also show that when training data is available, our method allows developing alternate parametrizations of surfaces under the framework of AtlasNet, leading to a compact network architecture and better reconstruction results on standard image to shape reconstruction benchmarks.

* 22 pages, 12 figures 
Viaarxiv icon

Learning Generative Models of Shape Handles

Apr 06, 2020
Matheus Gadelha, Giorgio Gori, Duygu Ceylan, Radomir Mech, Nathan Carr, Tamy Boubekeur, Rui Wang, Subhransu Maji

Figure 1 for Learning Generative Models of Shape Handles
Figure 2 for Learning Generative Models of Shape Handles
Figure 3 for Learning Generative Models of Shape Handles
Figure 4 for Learning Generative Models of Shape Handles

We present a generative model to synthesize 3D shapes as sets of handles -- lightweight proxies that approximate the original 3D shape -- for applications in interactive editing, shape parsing, and building compact 3D representations. Our model can generate handle sets with varying cardinality and different types of handles (Figure 1). Key to our approach is a deep architecture that predicts both the parameters and existence of shape handles, and a novel similarity measure that can easily accommodate different types of handles, such as cuboids or sphere-meshes. We leverage the recent advances in semantic 3D annotation as well as automatic shape summarizing techniques to supervise our approach. We show that the resulting shape representations are intuitive and achieve superior quality than previous state-of-the-art. Finally, we demonstrate how our method can be used in applications such as interactive shape editing, completion, and interpolation, leveraging the latent space learned by our model to guide these tasks. Project page: http://mgadelha.me/shapehandles.

* 11 pages, 11 figures, accepted do CVPR 2020 
Viaarxiv icon

Label-Efficient Learning on Point Clouds using Approximate Convex Decompositions

Mar 30, 2020
Matheus Gadelha, Aruni RoyChowdhury, Gopal Sharma, Evangelos Kalogerakis, Liangliang Cao, Erik Learned-Miller, Rui Wang, Subhransu Maji

Figure 1 for Label-Efficient Learning on Point Clouds using Approximate Convex Decompositions
Figure 2 for Label-Efficient Learning on Point Clouds using Approximate Convex Decompositions
Figure 3 for Label-Efficient Learning on Point Clouds using Approximate Convex Decompositions
Figure 4 for Label-Efficient Learning on Point Clouds using Approximate Convex Decompositions

The problems of shape classification and part segmentation from 3D point clouds have garnered increasing attention in the last few years. But both of these problems suffer from relatively small training sets, creating the need for statistically efficient methods to learn 3D shape representations. In this work, we investigate the use of Approximate Convex Decompositions (ACD) as a self-supervisory signal for label-efficient learning of point cloud representations. Decomposing a 3D shape into simpler constituent parts or primitives is a fundamental problem in geometrical shape processing. There has been extensive work on such decompositions, where the criterion for simplicity of a constituent shape is often defined in terms of convexity for solid primitives. In this paper, we show that using the results of ACD to approximate a ground truth segmentation provides excellent self-supervision for learning 3D point cloud representations that are highly effective on downstream tasks. We report improvements over the state-of-theart in unsupervised representation learning on the ModelNet40 shape classification dataset and significant gains in few-shot part segmentation on the ShapeNetPart dataset. Code available at https://github.com/matheusgadelha/PointCloudLearningACD

* 18 pages, 5 figures 
Viaarxiv icon

Inferring 3D Shapes from Image Collections using Adversarial Networks

Jun 11, 2019
Matheus Gadelha, Aartika Rai, Subhransu Maji, Rui Wang

Figure 1 for Inferring 3D Shapes from Image Collections using Adversarial Networks
Figure 2 for Inferring 3D Shapes from Image Collections using Adversarial Networks
Figure 3 for Inferring 3D Shapes from Image Collections using Adversarial Networks
Figure 4 for Inferring 3D Shapes from Image Collections using Adversarial Networks

We investigate the problem of learning a probabilistic distribution over three-dimensional shapes given two-dimensional views of multiple objects taken from unknown viewpoints. Our approach called projective generative adversarial network (PrGAN) trains a deep generative model of 3D shapes whose projections (or renderings) match the distributions of the provided 2D distribution. The addition of a differentiable projection module allows us to infer the underlying 3D shape distribution without access to any explicit 3D or viewpoint annotation during the learning phase. We show that our approach produces 3D shapes of comparable quality to GANs trained directly on 3D data. %for a number of shape categoriesincluding chairs, airplanes, and cars. Experiments also show that the disentangled representation of 2D shapes into geometry and viewpoint leads to a good generative model of 2D shapes. The key advantage of our model is that it estimates 3D shape, viewpoint, and generates novel views from an input image in a completely unsupervised manner. We further investigate how the generative models can be improved if additional information such as depth, viewpoint or part segmentations is available at training time. To this end, we present new differentiable projection operators that can be used by PrGAN to learn better 3D generative models. Our experiments show that our method can successfully leverage extra visual cues to create more diverse and accurate shapes.

* Source code: https://github.com/matheusgadelha/PrGAN . arXiv admin note: substantial text overlap with arXiv:1612.05872 
Viaarxiv icon