People grasp flexible visual concepts from a few examples. We explore a neurosymbolic system that learns how to infer programs that capture visual concepts in a domain-general fashion. We introduce Template Programs: programmatic expressions from a domain-specific language that specify structural and parametric patterns common to an input concept. Our framework supports multiple concept-related tasks, including few-shot generation and co-segmentation through parsing. We develop a learning paradigm that allows us to train networks that infer Template Programs directly from visual datasets that contain concept groupings. We run experiments across multiple visual domains: 2D layouts, Omniglot characters, and 3D shapes. We find that our method outperforms task-specific alternatives, and performs competitively against domain-specific approaches for the limited domains where they exist.
Programs offer compactness and structure that makes them an attractive representation for visual data. We explore how code rewriting can be used to improve systems for inferring programs from visual data. We first propose Sparse Intermittent Rewrite Injection (SIRI), a framework for unsupervised bootstrapped learning. SIRI sparsely applies code rewrite operations over a dataset of training programs, injecting the improved programs back into the training set. We design a family of rewriters for visual programming domains: parameter optimization, code pruning, and code grafting. For three shape programming languages in 2D and 3D, we show that using SIRI with our family of rewriters improves performance: better reconstructions and faster convergence rates, compared with bootstrapped learning methods that do not use rewriters or use them naively. Finally, we demonstrate that our family of rewriters can be effectively used at test time to improve the output of SIRI predictions. For 2D and 3D CSG, we outperform or match the reconstruction performance of recent domain-specific neural architectures, while producing more parsimonious programs that use significantly fewer primitives.
Programs are an increasingly popular representation for visual data, exposing compact, interpretable structure that supports manipulation. Visual programs are usually written in domain-specific languages (DSLs). Finding "good" programs, that only expose meaningful degrees of freedom, requires access to a DSL with a "good" library of functions, both of which are typically authored by domain experts. We present ShapeCoder, the first system capable of taking a dataset of shapes, represented with unstructured primitives, and jointly discovering (i) useful abstraction functions and (ii) programs that use these abstractions to explain the input shapes. The discovered abstractions capture common patterns (both structural and parametric) across the dataset, so that programs rewritten with these abstractions are more compact, and expose fewer degrees of freedom. ShapeCoder improves upon previous abstraction discovery methods, finding better abstractions, for more complex inputs, under less stringent input assumptions. This is principally made possible by two methodological advancements: (a) a shape to program recognition network that learns to solve sub-problems and (b) the use of e-graphs, augmented with a conditional rewrite scheme, to determine when abstractions with complex parametric expressions can be applied, in a tractable manner. We evaluate ShapeCoder on multiple datasets of 3D shapes, where primitive decompositions are either parsed from manual annotations or produced by an unsupervised cuboid abstraction method. In all domains, ShapeCoder discovers a library of abstractions that capture high-level relationships, remove extraneous degrees of freedom, and achieve better dataset compression compared with alternative approaches. Finally, we investigate how programs rewritten to use discovered abstractions prove useful for downstream tasks.
We present SHRED, a method for 3D SHape REgion Decomposition. SHRED takes a 3D point cloud as input and uses learned local operations to produce a segmentation that approximates fine-grained part instances. We endow SHRED with three decomposition operations: splitting regions, fixing the boundaries between regions, and merging regions together. Modules are trained independently and locally, allowing SHRED to generate high-quality segmentations for categories not seen during training. We train and evaluate SHRED with fine-grained segmentations from PartNet; using its merge-threshold hyperparameter, we show that SHRED produces segmentations that better respect ground-truth annotations compared with baseline methods, at any desired decomposition granularity. Finally, we demonstrate that SHRED is useful for downstream applications, out-performing all baselines on zero-shot fine-grained part instance segmentation and few-shot fine-grained semantic segmentation when combined with methods that learn to label shape regions.
The shape of many objects in the built environment is dictated by their relationships to the human body: how will a person interact with this object? Existing data-driven generative models of 3D shapes produce plausible objects but do not reason about the relationship of those objects to the human body. In this paper, we learn body-aware generative models of 3D shapes. Specifically, we train generative models of chairs, an ubiquitous shape category, which can be conditioned on a given body shape or sitting pose. The body-shape-conditioned models produce chairs which will be comfortable for a person with the given body shape; the pose-conditioned models produce chairs which accommodate the given sitting pose. To train these models, we define a "sitting pose matching" metric and a novel "sitting comfort" metric. Calculating these metrics requires an expensive optimization to sit the body into the chair, which is too slow to be used as a loss function for training a generative model. Thus, we train neural networks to efficiently approximate these metrics. We use our approach to train three body-aware generative shape models: a structured part-based generator, a point cloud generator, and an implicit surface generator. In all cases, our approach produces models which adapt their output chair shapes to input human body specifications.
Many learning-based 3D shape semantic segmentation methods assign labels to shape atoms (e.g. points in a point cloud or faces in a mesh) with a single-pass approach trained in an end-to-end fashion. Such methods achieve impressive performance but require large amounts of labeled training data. This paradigm entangles two separable subproblems: (1) decomposing a shape into regions and (2) assigning semantic labels to these regions. We claim that disentangling these subproblems reduces the labeled data burden: (1) region decomposition requires no semantic labels and could be performed in an unsupervised fashion, and (2) labeling shape regions instead of atoms results in a smaller search space and should be learnable with less labeled training data. In this paper, we investigate this second claim by presenting the Neurally-Guided Shape Parser (NGSP), a method that learns how to assign semantic labels to regions of an over-segmented 3D shape. We solve this problem via MAP inference, modeling the posterior probability of a labeling assignment conditioned on an input shape. We employ a Monte Carlo importance sampling approach guided by a neural proposal network, a search-based approach made feasible by assuming the input shape is decomposed into discrete regions. We evaluate NGSP on the task of hierarchical semantic segmentation on manufactured 3D shapes from PartNet. We find that NGSP delivers significant performance improvements over baselines that learn to label shape atoms and then aggregate predictions for each shape region, especially in low-data regimes. Finally, we demonstrate that NGSP is robust to region granularity, as it maintains strong segmentation performance even as the regions undergo significant corruption.
A popular way to create detailed yet easily controllable 3D shapes is via procedural modeling, i.e. generating geometry using programs. Such programs consist of a series of instructions along with their associated parameter values. To fully realize the benefits of this representation, a shape program should be compact and only expose degrees of freedom that allow for meaningful manipulation of output geometry. One way to achieve this goal is to design higher-level macro operators that, when executed, expand into a series of commands from the base shape modeling language. However, manually authoring such macros, much like shape programs themselves, is difficult and largely restricted to domain experts. In this paper, we present ShapeMOD, an algorithm for automatically discovering macros that are useful across large datasets of 3D shape programs. ShapeMOD operates on shape programs expressed in an imperative, statement-based language. It is designed to discover macros that make programs more compact by minimizing the number of function calls and free parameters required to represent an input shape collection. We run ShapeMOD on multiple collections of programs expressed in a domain-specific language for 3D shape structures. We show that it automatically discovers a concise set of macros that abstract out common structural and parametric patterns that generalize over large shape collections. We also demonstrate that the macros found by ShapeMOD improve performance on downstream tasks including shape generative modeling and inferring programs from point clouds. Finally, we conduct a user study that indicates that ShapeMOD's discovered macros make interactive shape editing more efficient.
Inferring programs which generate 2D and 3D shapes is important for reverse engineering, enabling shape editing, and more. Supervised learning is hard to apply to this problem, as paired (program, shape) data rarely exists. Recent approaches use supervised pre-training with randomly-generated programs and then refine using self-supervised learning. But self-supervised learning either requires that the program execution process be differentiable or relies on reinforcement learning, which is unstable and slow to converge. In this paper, we present a new approach for learning to infer shape programs, which we call latent execution self training (LEST). As with recent prior work, LEST starts by training on randomly-generated (program, shape) pairs. As its name implies, it is based on the idea of self-training: running a model on unlabeled input shapes, treating the predicted programs as ground truth latent labels, and training again. Self-training is known to be susceptible to local minima. LEST circumvents this problem by leveraging the fact that predicted latent programs are executable: for a given shape $\mathbf{x}^* \in S^*$ and its predicted program $\mathbf{z} \in P$, we execute $\mathbf{z}$ to obtain a shape $\mathbf{x} \in S$ and train on $(\mathbf{z} \in P, \mathbf{x} \in S)$ pairs, rather than $(\mathbf{z} \in P, \mathbf{x}^* \in S^*)$ pairs. Experiments show that the distribution of executed shapes $S$ converges toward the distribution of real shapes $S^*$. We establish connections between LEST and algorithms for learning generative models, including variational Bayes, wake sleep, and expectation maximization. For constructive solid geometry and assembly-based modeling, LEST's inferred programs converge to high reconstruction accuracy significantly faster than those of reinforcement learning.
Manually authoring 3D shapes is difficult and time consuming; generative models of 3D shapes offer compelling alternatives. Procedural representations are one such possibility: they offer high-quality and editable results but are difficult to author and often produce outputs with limited diversity. On the other extreme are deep generative models: given enough data, they can learn to generate any class of shape but their outputs have artifacts and the representation is not editable. In this paper, we take a step towards achieving the best of both worlds for novel 3D shape synthesis. We propose ShapeAssembly, a domain-specific "assembly-language" for 3D shape structures. ShapeAssembly programs construct shapes by declaring cuboid part proxies and attaching them to one another, in a hierarchical and symmetrical fashion. Its functions are parameterized with free variables, so that one program structure is able to capture a family of related shapes. We show how to extract ShapeAssembly programs from existing shape structures in the PartNet dataset. Then we train a deep generative model, a hierarchical sequence VAE, that learns to write novel ShapeAssembly programs. The program captures the subset of variability that is interpretable and editable. The deep model captures correlations across shape collections that are hard to express procedurally. We evaluate our approach by comparing shapes output by our generated programs to those from other recent shape structure synthesis models. We find that our generated shapes are more plausible and physically-valid than those of other methods. Additionally, we assess the latent spaces of these models, and find that ours is better structured and produces smoother interpolations. As an application, we use our generative model and differentiable program interpreter to infer and fit shape programs to unstructured geometry, such as point clouds.