Alert button
Picture for Marie-Julie Rakotosaona

Marie-Julie Rakotosaona

Alert button

SparseFusion: Fusing Multi-Modal Sparse Representations for Multi-Sensor 3D Object Detection

Apr 27, 2023
Yichen Xie, Chenfeng Xu, Marie-Julie Rakotosaona, Patrick Rim, Federico Tombari, Kurt Keutzer, Masayoshi Tomizuka, Wei Zhan

Figure 1 for SparseFusion: Fusing Multi-Modal Sparse Representations for Multi-Sensor 3D Object Detection
Figure 2 for SparseFusion: Fusing Multi-Modal Sparse Representations for Multi-Sensor 3D Object Detection
Figure 3 for SparseFusion: Fusing Multi-Modal Sparse Representations for Multi-Sensor 3D Object Detection
Figure 4 for SparseFusion: Fusing Multi-Modal Sparse Representations for Multi-Sensor 3D Object Detection

By identifying four important components of existing LiDAR-camera 3D object detection methods (LiDAR and camera candidates, transformation, and fusion outputs), we observe that all existing methods either find dense candidates or yield dense representations of scenes. However, given that objects occupy only a small part of a scene, finding dense candidates and generating dense representations is noisy and inefficient. We propose SparseFusion, a novel multi-sensor 3D detection method that exclusively uses sparse candidates and sparse representations. Specifically, SparseFusion utilizes the outputs of parallel detectors in the LiDAR and camera modalities as sparse candidates for fusion. We transform the camera candidates into the LiDAR coordinate space by disentangling the object representations. Then, we can fuse the multi-modality candidates in a unified 3D space by a lightweight self-attention module. To mitigate negative transfer between modalities, we propose novel semantic and geometric cross-modality transfer modules that are applied prior to the modality-specific detectors. SparseFusion achieves state-of-the-art performance on the nuScenes benchmark while also running at the fastest speed, even outperforming methods with stronger backbones. We perform extensive experiments to demonstrate the effectiveness and efficiency of our modules and overall method pipeline. Our code will be made publicly available at https://github.com/yichen928/SparseFusion.

Viaarxiv icon

NeRFMeshing: Distilling Neural Radiance Fields into Geometrically-Accurate 3D Meshes

Mar 16, 2023
Marie-Julie Rakotosaona, Fabian Manhardt, Diego Martin Arroyo, Michael Niemeyer, Abhijit Kundu, Federico Tombari

Figure 1 for NeRFMeshing: Distilling Neural Radiance Fields into Geometrically-Accurate 3D Meshes
Figure 2 for NeRFMeshing: Distilling Neural Radiance Fields into Geometrically-Accurate 3D Meshes
Figure 3 for NeRFMeshing: Distilling Neural Radiance Fields into Geometrically-Accurate 3D Meshes
Figure 4 for NeRFMeshing: Distilling Neural Radiance Fields into Geometrically-Accurate 3D Meshes

With the introduction of Neural Radiance Fields (NeRFs), novel view synthesis has recently made a big leap forward. At the core, NeRF proposes that each 3D point can emit radiance, allowing to conduct view synthesis using differentiable volumetric rendering. While neural radiance fields can accurately represent 3D scenes for computing the image rendering, 3D meshes are still the main scene representation supported by most computer graphics and simulation pipelines, enabling tasks such as real time rendering and physics-based simulations. Obtaining 3D meshes from neural radiance fields still remains an open challenge since NeRFs are optimized for view synthesis, not enforcing an accurate underlying geometry on the radiance field. We thus propose a novel compact and flexible architecture that enables easy 3D surface reconstruction from any NeRF-driven approach. Upon having trained the radiance field, we distill the volumetric 3D representation into a Signed Surface Approximation Network, allowing easy extraction of the 3D mesh and appearance. Our final 3D mesh is physically accurate and can be rendered in real time on an array of devices.

Viaarxiv icon

SPARF: Neural Radiance Fields from Sparse and Noisy Poses

Nov 21, 2022
Prune Truong, Marie-Julie Rakotosaona, Fabian Manhardt, Federico Tombari

Figure 1 for SPARF: Neural Radiance Fields from Sparse and Noisy Poses
Figure 2 for SPARF: Neural Radiance Fields from Sparse and Noisy Poses
Figure 3 for SPARF: Neural Radiance Fields from Sparse and Noisy Poses
Figure 4 for SPARF: Neural Radiance Fields from Sparse and Noisy Poses

Neural Radiance Field (NeRF) has recently emerged as a powerful representation to synthesize photorealistic novel views. While showing impressive performance, it relies on the availability of dense input views with highly accurate camera poses, thus limiting its application in real-world scenarios. In this work, we introduce Sparse Pose Adjusting Radiance Field (SPARF), to address the challenge of novel-view synthesis given only few wide-baseline input images (as low as 3) with noisy camera poses. Our approach exploits multi-view geometry constraints in order to jointly learn the NeRF and refine the camera poses. By relying on pixel matches extracted between the input views, our multi-view correspondence objective enforces the optimized scene and camera poses to converge to a global and geometrically accurate solution. Our depth consistency loss further encourages the reconstructed scene to be consistent from any viewpoint. Our approach sets a new state of the art in the sparse-view regime on multiple challenging datasets.

* Code will be released upon publication 
Viaarxiv icon

Shape, Pose, and Appearance from a Single Image via Bootstrapped Radiance Field Inversion

Nov 21, 2022
Dario Pavllo, David Joseph Tan, Marie-Julie Rakotosaona, Federico Tombari

Figure 1 for Shape, Pose, and Appearance from a Single Image via Bootstrapped Radiance Field Inversion
Figure 2 for Shape, Pose, and Appearance from a Single Image via Bootstrapped Radiance Field Inversion
Figure 3 for Shape, Pose, and Appearance from a Single Image via Bootstrapped Radiance Field Inversion
Figure 4 for Shape, Pose, and Appearance from a Single Image via Bootstrapped Radiance Field Inversion

Neural Radiance Fields (NeRF) coupled with GANs represent a promising direction in the area of 3D reconstruction from a single view, owing to their ability to efficiently model arbitrary topologies. Recent work in this area, however, has mostly focused on synthetic datasets where exact ground-truth poses are known, and has overlooked pose estimation, which is important for certain downstream applications such as augmented reality (AR) and robotics. We introduce a principled end-to-end reconstruction framework for natural images, where accurate ground-truth poses are not available. Our approach recovers an SDF-parameterized 3D shape, pose, and appearance from a single image of an object, without exploiting multiple views during training. More specifically, we leverage an unconditional 3D-aware generator, to which we apply a hybrid inversion scheme where a model produces a first guess of the solution which is then refined via optimization. Our framework can de-render an image in as few as 10 steps, enabling its use in practical scenarios. We demonstrate state-of-the-art results on a variety of real and synthetic benchmarks.

Viaarxiv icon

Differentiable Surface Triangulation

Sep 22, 2021
Marie-Julie Rakotosaona, Noam Aigerman, Niloy Mitra, Maks Ovsjanikov, Paul Guerrero

Figure 1 for Differentiable Surface Triangulation
Figure 2 for Differentiable Surface Triangulation
Figure 3 for Differentiable Surface Triangulation
Figure 4 for Differentiable Surface Triangulation

Triangle meshes remain the most popular data representation for surface geometry. This ubiquitous representation is essentially a hybrid one that decouples continuous vertex locations from the discrete topological triangulation. Unfortunately, the combinatorial nature of the triangulation prevents taking derivatives over the space of possible meshings of any given surface. As a result, to date, mesh processing and optimization techniques have been unable to truly take advantage of modular gradient descent components of modern optimization frameworks. In this work, we present a differentiable surface triangulation that enables optimization for any per-vertex or per-face differentiable objective function over the space of underlying surface triangulations. Our method builds on the result that any 2D triangulation can be achieved by a suitably perturbed weighted Delaunay triangulation. We translate this result into a computational algorithm by proposing a soft relaxation of the classical weighted Delaunay triangulation and optimizing over vertex weights and vertex locations. We extend the algorithm to 3D by decomposing shapes into developable sets and differentiably meshing each set with suitable boundary constraints. We demonstrate the efficacy of our method on various planar and surface meshes on a range of difficult-to-optimize objective functions. Our code can be found online: https://github.com/mrakotosaon/diff-surface-triangulation.

Viaarxiv icon

Learning Delaunay Surface Elements for Mesh Reconstruction

Dec 02, 2020
Marie-Julie Rakotosaona, Paul Guerrero, Noam Aigerman, Niloy Mitra, Maks Ovsjanikov

Figure 1 for Learning Delaunay Surface Elements for Mesh Reconstruction
Figure 2 for Learning Delaunay Surface Elements for Mesh Reconstruction
Figure 3 for Learning Delaunay Surface Elements for Mesh Reconstruction
Figure 4 for Learning Delaunay Surface Elements for Mesh Reconstruction

We present a method for reconstructing triangle meshes from point clouds. Existing learning-based methods for mesh reconstruction mostly generate triangles individually, making it hard to create manifold meshes. We leverage the properties of 2D Delaunay triangulations to construct a mesh from manifold surface elements. Our method first estimates local geodesic neighborhoods around each point. We then perform a 2D projection of these neighborhoods using a learned logarithmic map. A Delaunay triangulation in this 2D domain is guaranteed to produce a manifold patch, which we call a Delaunay surface element. We synchronize the local 2D projections of neighboring elements to maximize the manifoldness of the reconstructed mesh. Our results show that we achieve better overall manifoldness of our reconstructed meshes than current methods to reconstruct meshes with arbitrary topology.

Viaarxiv icon

Correspondence Learning via Linearly-invariant Embedding

Oct 25, 2020
Riccardo Marin, Marie-Julie Rakotosaona, Simone Melzi, Maks Ovsjanikov

Figure 1 for Correspondence Learning via Linearly-invariant Embedding
Figure 2 for Correspondence Learning via Linearly-invariant Embedding
Figure 3 for Correspondence Learning via Linearly-invariant Embedding
Figure 4 for Correspondence Learning via Linearly-invariant Embedding

In this paper, we propose a fully differentiable pipeline for estimating accurate dense correspondences between 3D point clouds. The proposed pipeline is an extension and a generalization of the functional maps framework. However, instead of using the Laplace-Beltrami eigenfunctions as done in virtually all previous works in this domain, we demonstrate that learning the basis from data can both improve robustness and lead to better accuracy in challenging settings. We interpret the basis as a learned embedding into a higher dimensional space. Following the functional map paradigm the optimal transformation in this embedding space must be linear and we propose a separate architecture aimed at estimating the transformation by learning optimal descriptor functions. This leads to the first end-to-end trainable functional map-based correspondence approach in which both the basis and the descriptors are learned from data. Interestingly, we also observe that learning a \emph{canonical} embedding leads to worse results, suggesting that leaving an extra linear degree of freedom to the embedding network gives it more robustness, thereby also shedding light onto the success of previous methods. Finally, we demonstrate that our approach achieves state-of-the-art results in challenging non-rigid 3D point cloud correspondence applications.

Viaarxiv icon

Intrinsic Point Cloud Interpolation via Dual Latent Space Navigation

Apr 03, 2020
Marie-Julie Rakotosaona, Maks Ovsjanikov

Figure 1 for Intrinsic Point Cloud Interpolation via Dual Latent Space Navigation
Figure 2 for Intrinsic Point Cloud Interpolation via Dual Latent Space Navigation
Figure 3 for Intrinsic Point Cloud Interpolation via Dual Latent Space Navigation
Figure 4 for Intrinsic Point Cloud Interpolation via Dual Latent Space Navigation

We present a learning-based method for interpolating and manipulating 3D shapes represented as point clouds, that is explicitly designed to preserve intrinsic shape properties. Our approach is based on constructing a dual encoding space that enables shape synthesis and, at the same time, provides links to the intrinsic shape information, which is typically not available on point cloud data. Our method works in a single pass and avoids expensive optimization, employed by existing techniques. Furthermore, the strong regularization provided by our dual latent space approach also helps to improve shape recovery in challenging settings from noisy point clouds across different datasets. Extensive experiments show that our method results in more realistic and smoother interpolations compared to baselines.

Viaarxiv icon

Effective Rotation-invariant Point CNN with Spherical Harmonics kernels

Jun 27, 2019
Adrien Poulenard, Marie-Julie Rakotosaona, Yann Ponty, Maks Ovsjanikov

Figure 1 for Effective Rotation-invariant Point CNN with Spherical Harmonics kernels
Figure 2 for Effective Rotation-invariant Point CNN with Spherical Harmonics kernels
Figure 3 for Effective Rotation-invariant Point CNN with Spherical Harmonics kernels
Figure 4 for Effective Rotation-invariant Point CNN with Spherical Harmonics kernels

We present a novel rotation invariant architecture operating directly on point cloud data. We demonstrate how rotation invariance can be injected into a recently proposed point-based PCNN architecture, at all layers of the network, achieving invariance to both global shape transformations, and to local rotations on the level of patches or parts, useful when dealing with non-rigid objects. We achieve this by employing a spherical harmonics based kernel at different layers of the network, which is guaranteed to be invariant to rigid motions. We also introduce a more efficient pooling operation for PCNN using space-partitioning data-structures. This results in a flexible, simple and efficient architecture that achieves accurate results on challenging shape analysis tasks including classification and segmentation, without requiring data-augmentation, typically employed by non-invariant approaches.

Viaarxiv icon

OperatorNet: Recovering 3D Shapes From Difference Operators

Apr 24, 2019
Ruqi Huang, Marie-Julie Rakotosaona, Panos Achlioptas, Leonidas Guibas, Maks Ovsjanikov

Figure 1 for OperatorNet: Recovering 3D Shapes From Difference Operators
Figure 2 for OperatorNet: Recovering 3D Shapes From Difference Operators
Figure 3 for OperatorNet: Recovering 3D Shapes From Difference Operators
Figure 4 for OperatorNet: Recovering 3D Shapes From Difference Operators

This paper proposes a learning-based framework for reconstructing 3D shapes from functional operators, compactly encoded as small-sized matrices. To this end we introduce a novel neural architecture, called OperatorNet, which takes as input a set of linear operators representing a shape and produces its 3D embedding. We demonstrate that this approach significantly outperforms previous purely geometric methods for the same problem. Furthermore, we introduce a novel functional operator, which encodes the extrinsic or pose-dependent shape information, and thus complements purely intrinsic pose-oblivious operators, such as the classical Laplacian. Coupled with this novel operator, our reconstruction network achieves very high reconstruction accuracy, even in the presence of incomplete information about a shape, given a soft or functional map expressed in a reduced basis. Finally, we demonstrate that the multiplicative functional algebra enjoyed by these operators can be used to synthesize entirely new unseen shapes, in the context of shape interpolation and shape analogy applications.

* 13 pages, 14 figures and 2 tables 
Viaarxiv icon