Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Niloy J. Mitra

GOEnFusion: Gradient Origin Encodings for 3D Forward Diffusion Models

Dec 14, 2023

Animesh Karnewar, Andrea Vedaldi, Niloy J. Mitra, David Novotny

Abstract:The recently introduced Forward-Diffusion method allows to train a 3D diffusion model using only 2D images for supervision. However, it does not easily generalise to different 3D representations and requires a computationally expensive auto-regressive sampling process to generate the underlying 3D scenes. In this paper, we propose GOEn: Gradient Origin Encoding (pronounced "gone"). GOEn can encode input images into any type of 3D representation without the need to use a pre-trained image feature extractor. It can also handle single, multiple or no source view(s) alike, by design, and tries to maximise the information transfer from the views to the encodings. Our proposed GOEnFusion model pairs GOEn encodings with a realisation of the Forward-Diffusion model which addresses the limitations of the vanilla Forward-Diffusion realisation. We evaluate how much information the GOEn mechanism transfers to the encoded representations, and how well it captures the prior distribution over the underlying 3D scenes, through the lens of a partial AutoEncoder. Lastly, the efficacy of the GOEnFusion model is evaluated on the recently proposed OmniObject3D dataset while comparing to the state-of-the-art Forward and non-Forward-Diffusion models and other 3D generative models.

* project page at: https://holodiffusion.github.io/goenfusion

Via

Access Paper or Ask Questions

LooseControl: Lifting ControlNet for Generalized Depth Conditioning

Dec 05, 2023

Shariq Farooq Bhat, Niloy J. Mitra, Peter Wonka

Figure 1 for LooseControl: Lifting ControlNet for Generalized Depth Conditioning

Figure 2 for LooseControl: Lifting ControlNet for Generalized Depth Conditioning

Figure 3 for LooseControl: Lifting ControlNet for Generalized Depth Conditioning

Figure 4 for LooseControl: Lifting ControlNet for Generalized Depth Conditioning

Abstract:We present LooseControl to allow generalized depth conditioning for diffusion-based image generation. ControlNet, the SOTA for depth-conditioned image generation, produces remarkable results but relies on having access to detailed depth maps for guidance. Creating such exact depth maps, in many scenarios, is challenging. This paper introduces a generalized version of depth conditioning that enables many new content-creation workflows. Specifically, we allow (C1) scene boundary control for loosely specifying scenes with only boundary conditions, and (C2) 3D box control for specifying layout locations of the target objects rather than the exact shape and appearance of the objects. Using LooseControl, along with text guidance, users can create complex environments (e.g., rooms, street views, etc.) by specifying only scene boundaries and locations of primary objects. Further, we provide two editing mechanisms to refine the results: (E1) 3D box editing enables the user to refine images by changing, adding, or removing boxes while freezing the style of the image. This yields minimal changes apart from changes induced by the edited boxes. (E2) Attribute editing proposes possible editing directions to change one particular aspect of the scene, such as the overall object density or a particular object. Extensive tests and comparisons with baselines demonstrate the generality of our method. We believe that LooseControl can become an important design tool for easily creating complex environments and be extended to other forms of guidance channels. Code and more information are available at https://shariqfarooq123.github.io/loose-control/ .

Via

Access Paper or Ask Questions

Evaluating VLMs for Score-Based, Multi-Probe Annotation of 3D Objects

Nov 29, 2023

Rishabh Kabra, Loic Matthey, Alexander Lerchner, Niloy J. Mitra

Abstract:Unlabeled 3D objects present an opportunity to leverage pretrained vision language models (VLMs) on a range of annotation tasks -- from describing object semantics to physical properties. An accurate response must take into account the full appearance of the object in 3D, various ways of phrasing the question/prompt, and changes in other factors that affect the response. We present a method to marginalize over any factors varied across VLM queries, utilizing the VLM's scores for sampled responses. We first show that this probabilistic aggregation can outperform a language model (e.g., GPT4) for summarization, for instance avoiding hallucinations when there are contrasting details between responses. Secondly, we show that aggregated annotations are useful for prompt-chaining; they help improve downstream VLM predictions (e.g., of object material when the object's type is specified as an auxiliary input in the prompt). Such auxiliary inputs allow ablating and measuring the contribution of visual reasoning over language-only reasoning. Using these evaluations, we show how VLMs can approach, without additional training or in-context learning, the quality of human-verified type and material annotations on the large-scale Objaverse dataset.

Via

Access Paper or Ask Questions

Diffusion 3D Features (Diff3F): Decorating Untextured Shapes with Distilled Semantic Features

Nov 28, 2023

Niladri Shekhar Dutt, Sanjeev Muralikrishnan, Niloy J. Mitra

Figure 1 for Diffusion 3D Features (Diff3F): Decorating Untextured Shapes with Distilled Semantic Features

Figure 2 for Diffusion 3D Features (Diff3F): Decorating Untextured Shapes with Distilled Semantic Features

Figure 3 for Diffusion 3D Features (Diff3F): Decorating Untextured Shapes with Distilled Semantic Features

Figure 4 for Diffusion 3D Features (Diff3F): Decorating Untextured Shapes with Distilled Semantic Features

Abstract:We present Diff3F as a simple, robust, and class-agnostic feature descriptor that can be computed for untextured input shapes (meshes or point clouds). Our method distills diffusion features from image foundational models onto input shapes. Specifically, we use the input shapes to produce depth and normal maps as guidance for conditional image synthesis, and in the process produce (diffusion) features in 2D that we subsequently lift and aggregate on the original surface. Our key observation is that even if the conditional image generations obtained from multi-view rendering of the input shapes are inconsistent, the associated image features are robust and can be directly aggregated across views. This produces semantic features on the input shapes, without requiring additional data or training. We perform extensive experiments on multiple benchmarks (SHREC'19, SHREC'20, and TOSCA) and demonstrate that our features, being semantic instead of geometric, produce reliable correspondence across both isometeric and non-isometrically related shape families.

Via

Access Paper or Ask Questions

Efficient Gradient Estimation via Adaptive Sampling and Importance Sampling

Nov 27, 2023

Corentin Salaün, Xingchang Huang, Iliyan Georgiev, Niloy J. Mitra, Gurprit Singh

Figure 1 for Efficient Gradient Estimation via Adaptive Sampling and Importance Sampling

Figure 2 for Efficient Gradient Estimation via Adaptive Sampling and Importance Sampling

Figure 3 for Efficient Gradient Estimation via Adaptive Sampling and Importance Sampling

Figure 4 for Efficient Gradient Estimation via Adaptive Sampling and Importance Sampling

Abstract:Machine learning problems rely heavily on stochastic gradient descent (SGD) for optimization. The effectiveness of SGD is contingent upon accurately estimating gradients from a mini-batch of data samples. Instead of the commonly used uniform sampling, adaptive or importance sampling reduces noise in gradient estimation by forming mini-batches that prioritize crucial data points. Previous research has suggested that data points should be selected with probabilities proportional to their gradient norm. Nevertheless, existing algorithms have struggled to efficiently integrate importance sampling into machine learning frameworks. In this work, we make two contributions. First, we present an algorithm that can incorporate existing importance functions into our framework. Second, we propose a simplified importance function that relies solely on the loss gradient of the output layer. By leveraging our proposed gradient estimation techniques, we observe improved convergence in classification and regression tasks with minimal computational overhead. We validate the effectiveness of our adaptive and importance-sampling approach on image and point-cloud datasets.

* 15 pages, 10 figures

Via

Access Paper or Ask Questions

ProteusNeRF: Fast Lightweight NeRF Editing using 3D-Aware Image Context

Oct 15, 2023

Binglun Wang, Niladri Shekhar Dutt, Niloy J. Mitra

Figure 1 for ProteusNeRF: Fast Lightweight NeRF Editing using 3D-Aware Image Context

Figure 2 for ProteusNeRF: Fast Lightweight NeRF Editing using 3D-Aware Image Context

Figure 3 for ProteusNeRF: Fast Lightweight NeRF Editing using 3D-Aware Image Context

Figure 4 for ProteusNeRF: Fast Lightweight NeRF Editing using 3D-Aware Image Context

Abstract:Neural Radiance Fields (NeRFs) have recently emerged as a popular option for photo-realistic object capture due to their ability to faithfully capture high-fidelity volumetric content even from handheld video input. Although much research has been devoted to efficient optimization leading to real-time training and rendering, options for interactive editing NeRFs remain limited. We present a very simple but effective neural network architecture that is fast and efficient while maintaining a low memory footprint. This architecture can be incrementally guided through user-friendly image-based edits. Our representation allows straightforward object selection via semantic feature distillation at the training stage. More importantly, we propose a local 3D-aware image context to facilitate view-consistent image editing that can then be distilled into fine-tuned NeRFs, via geometric and appearance adjustments. We evaluate our setup on a variety of examples to demonstrate appearance and geometric edits and report 10-30x speedup over concurrent work focusing on text-guided NeRF editing. Video results can be seen on our project webpage at https://proteusnerf.github.io.

Via

Access Paper or Ask Questions

Neural Semantic Surface Maps

Sep 09, 2023

Luca Morreale, Noam Aigerman, Vladimir G. Kim, Niloy J. Mitra

Abstract:We present an automated technique for computing a map between two genus-zero shapes, which matches semantically corresponding regions to one another. Lack of annotated data prohibits direct inference of 3D semantic priors; instead, current State-of-the-art methods predominantly optimize geometric properties or require varying amounts of manual annotation. To overcome the lack of annotated training data, we distill semantic matches from pre-trained vision models: our method renders the pair of 3D shapes from multiple viewpoints; the resulting renders are then fed into an off-the-shelf image-matching method which leverages a pretrained visual model to produce feature points. This yields semantic correspondences, which can be projected back to the 3D shapes, producing a raw matching that is inaccurate and inconsistent between different viewpoints. These correspondences are refined and distilled into an inter-surface map by a dedicated optimization scheme, which promotes bijectivity and continuity of the output map. We illustrate that our approach can generate semantic surface-to-surface maps, eliminating manual annotations or any 3D training data requirement. Furthermore, it proves effective in scenarios with high semantic complexity, where objects are non-isometrically related, as well as in situations where they are nearly isometric.

Via

Access Paper or Ask Questions

BLiSS: Bootstrapped Linear Shape Space

Sep 04, 2023

Sanjeev Muralikrishnan, Chun-Hao Paul Huang, Duygu Ceylan, Niloy J. Mitra

Figure 1 for BLiSS: Bootstrapped Linear Shape Space

Figure 2 for BLiSS: Bootstrapped Linear Shape Space

Figure 3 for BLiSS: Bootstrapped Linear Shape Space

Figure 4 for BLiSS: Bootstrapped Linear Shape Space

Abstract:Morphable models are fundamental to numerous human-centered processes as they offer a simple yet expressive shape space. Creating such morphable models, however, is both tedious and expensive. The main challenge is establishing dense correspondences across raw scans that capture sufficient shape variation. This is often addressed using a mix of significant manual intervention and non-rigid registration. We observe that creating a shape space and solving for dense correspondence are tightly coupled -- while dense correspondence is needed to build shape spaces, an expressive shape space provides a reduced dimensional space to regularize the search. We introduce BLiSS, a method to solve both progressively. Starting from a small set of manually registered scans to bootstrap the process, we enrich the shape space and then use that to get new unregistered scans into correspondence automatically. The critical component of BLiSS is a non-linear deformation model that captures details missed by the low-dimensional shape space, thus allowing progressive enrichment of the space.

* 12 pages, 10 figures

Via

Access Paper or Ask Questions

HoloFusion: Towards Photo-realistic 3D Generative Modeling

Aug 28, 2023

Animesh Karnewar, Niloy J. Mitra, Andrea Vedaldi, David Novotny

Abstract:Diffusion-based image generators can now produce high-quality and diverse samples, but their success has yet to fully translate to 3D generation: existing diffusion methods can either generate low-resolution but 3D consistent outputs, or detailed 2D views of 3D objects but with potential structural defects and lacking view consistency or realism. We present HoloFusion, a method that combines the best of these approaches to produce high-fidelity, plausible, and diverse 3D samples while learning from a collection of multi-view 2D images only. The method first generates coarse 3D samples using a variant of the recently proposed HoloDiffusion generator. Then, it independently renders and upsamples a large number of views of the coarse 3D model, super-resolves them to add detail, and distills those into a single, high-fidelity implicit 3D representation, which also ensures view consistency of the final renders. The super-resolution network is trained as an integral part of HoloFusion, end-to-end, and the final distillation uses a new sampling scheme to capture the space of super-resolved signals. We compare our method against existing baselines, including DreamFusion, Get3D, EG3D, and HoloDiffusion, and achieve, to the best of our knowledge, the most realistic results on the challenging CO3Dv2 dataset.

* ICCV 2023 conference; project page at: https://holodiffusion.github.io/holofusion

Via

Access Paper or Ask Questions

ShapeCoder: Discovering Abstractions for Visual Programs from Unstructured Primitives

May 09, 2023

R. Kenny Jones, Paul Guerrero, Niloy J. Mitra, Daniel Ritchie

Abstract:Programs are an increasingly popular representation for visual data, exposing compact, interpretable structure that supports manipulation. Visual programs are usually written in domain-specific languages (DSLs). Finding "good" programs, that only expose meaningful degrees of freedom, requires access to a DSL with a "good" library of functions, both of which are typically authored by domain experts. We present ShapeCoder, the first system capable of taking a dataset of shapes, represented with unstructured primitives, and jointly discovering (i) useful abstraction functions and (ii) programs that use these abstractions to explain the input shapes. The discovered abstractions capture common patterns (both structural and parametric) across the dataset, so that programs rewritten with these abstractions are more compact, and expose fewer degrees of freedom. ShapeCoder improves upon previous abstraction discovery methods, finding better abstractions, for more complex inputs, under less stringent input assumptions. This is principally made possible by two methodological advancements: (a) a shape to program recognition network that learns to solve sub-problems and (b) the use of e-graphs, augmented with a conditional rewrite scheme, to determine when abstractions with complex parametric expressions can be applied, in a tractable manner. We evaluate ShapeCoder on multiple datasets of 3D shapes, where primitive decompositions are either parsed from manual annotations or produced by an unsupervised cuboid abstraction method. In all domains, ShapeCoder discovers a library of abstractions that capture high-level relationships, remove extraneous degrees of freedom, and achieve better dataset compression compared with alternative approaches. Finally, we investigate how programs rewritten to use discovered abstractions prove useful for downstream tasks.

* SIGGRAPH 2023

Via

Access Paper or Ask Questions