Alert button
Picture for Chin-Yi Cheng

Chin-Yi Cheng

Alert button

A-Scan2BIM: Assistive Scan to Building Information Modeling

Nov 30, 2023
Weilian Song, Jieliang Luo, Dale Zhao, Yan Fu, Chin-Yi Cheng, Yasutaka Furukawa

This paper proposes an assistive system for architects that converts a large-scale point cloud into a standardized digital representation of a building for Building Information Modeling (BIM) applications. The process is known as Scan-to-BIM, which requires many hours of manual work even for a single building floor by a professional architect. Given its challenging nature, the paper focuses on helping architects on the Scan-to-BIM process, instead of replacing them. Concretely, we propose an assistive Scan-to-BIM system that takes the raw sensor data and edit history (including the current BIM model), then auto-regressively predicts a sequence of model editing operations as APIs of a professional BIM software (i.e., Autodesk Revit). The paper also presents the first building-scale Scan2BIM dataset that contains a sequence of model editing operations as the APIs of Autodesk Revit. The dataset contains 89 hours of Scan2BIM modeling processes by professional architects over 16 scenes, spanning over 35,000 m^2. We report our system's reconstruction quality with standard metrics, and we introduce a novel metric that measures how natural the order of reconstructed operations is. A simple modification to the reconstruction module helps improve performance, and our method is far superior to two other baselines in the order metric. We will release data, code, and models at a-scan2bim.github.io.

* BMVC 2023, order evaluation updated after fixing evaluation bug 
Viaarxiv icon

Representation Learning for Sequential Volumetric Design Tasks

Sep 05, 2023
Md Ferdous Alam, Yi Wang, Linh Tran, Chin-Yi Cheng, Jieliang Luo

Figure 1 for Representation Learning for Sequential Volumetric Design Tasks
Figure 2 for Representation Learning for Sequential Volumetric Design Tasks
Figure 3 for Representation Learning for Sequential Volumetric Design Tasks
Figure 4 for Representation Learning for Sequential Volumetric Design Tasks

Volumetric design, also called massing design, is the first and critical step in professional building design which is sequential in nature. As the volumetric design process is complex, the underlying sequential design process encodes valuable information for designers. Many efforts have been made to automatically generate reasonable volumetric designs, but the quality of the generated design solutions varies, and evaluating a design solution requires either a prohibitively comprehensive set of metrics or expensive human expertise. While previous approaches focused on learning only the final design instead of sequential design tasks, we propose to encode the design knowledge from a collection of expert or high-performing design sequences and extract useful representations using transformer-based models. Later we propose to utilize the learned representations for crucial downstream applications such as design preference evaluation and procedural design generation. We develop the preference model by estimating the density of the learned representations whereas we train an autoregressive transformer model for sequential design generation. We demonstrate our ideas by leveraging a novel dataset of thousands of sequential volumetric designs. Our preference model can compare two arbitrarily given design sequences and is almost 90% accurate in evaluation against random design sequences. Our autoregressive model is also capable of autocompleting a volumetric design sequence from a partial design sequence.

Viaarxiv icon

IKEA-Manual: Seeing Shape Assembly Step by Step

Feb 03, 2023
Ruocheng Wang, Yunzhi Zhang, Jiayuan Mao, Ran Zhang, Chin-Yi Cheng, Jiajun Wu

Figure 1 for IKEA-Manual: Seeing Shape Assembly Step by Step
Figure 2 for IKEA-Manual: Seeing Shape Assembly Step by Step
Figure 3 for IKEA-Manual: Seeing Shape Assembly Step by Step
Figure 4 for IKEA-Manual: Seeing Shape Assembly Step by Step

Human-designed visual manuals are crucial components in shape assembly activities. They provide step-by-step guidance on how we should move and connect different parts in a convenient and physically-realizable way. While there has been an ongoing effort in building agents that perform assembly tasks, the information in human-design manuals has been largely overlooked. We identify that this is due to 1) a lack of realistic 3D assembly objects that have paired manuals and 2) the difficulty of extracting structured information from purely image-based manuals. Motivated by this observation, we present IKEA-Manual, a dataset consisting of 102 IKEA objects paired with assembly manuals. We provide fine-grained annotations on the IKEA objects and assembly manuals, including decomposed assembly parts, assembly plans, manual segmentation, and 2D-3D correspondence between 3D parts and visual manuals. We illustrate the broad application of our dataset on four tasks related to shape assembly: assembly plan generation, part segmentation, pose estimation, and 3D part assembly.

* NeurIPS 2022 Datasets and Benchmarks Track. Project page: https://cs.stanford.edu/~rcwang/projects/ikea_manual 
Viaarxiv icon

PLay: Parametrically Conditioned Layout Generation using Latent Diffusion

Jan 27, 2023
Chin-Yi Cheng, Forrest Huang, Gang Li, Yang Li

Figure 1 for PLay: Parametrically Conditioned Layout Generation using Latent Diffusion
Figure 2 for PLay: Parametrically Conditioned Layout Generation using Latent Diffusion
Figure 3 for PLay: Parametrically Conditioned Layout Generation using Latent Diffusion
Figure 4 for PLay: Parametrically Conditioned Layout Generation using Latent Diffusion

Layout design is an important task in various design fields, including user interfaces, document, and graphic design. As this task requires tedious manual effort by designers, prior works have attempted to automate this process using generative models, but commonly fell short of providing intuitive user controls and achieving design objectives. In this paper, we build a conditional latent diffusion model, PLay, that generates parametrically conditioned layouts in vector graphic space from user-specified guidelines, which are commonly used by designers for representing their design intents in current practices. Our method outperforms prior works across three datasets on metrics including FID and FD-VG, and in user test. Moreover, it brings a novel and interactive experience to professional layout design processes.

Viaarxiv icon

Translating a Visual LEGO Manual to a Machine-Executable Plan

Jul 25, 2022
Ruocheng Wang, Yunzhi Zhang, Jiayuan Mao, Chin-Yi Cheng, Jiajun Wu

Figure 1 for Translating a Visual LEGO Manual to a Machine-Executable Plan
Figure 2 for Translating a Visual LEGO Manual to a Machine-Executable Plan
Figure 3 for Translating a Visual LEGO Manual to a Machine-Executable Plan

We study the problem of translating an image-based, step-by-step assembly manual created by human designers into machine-interpretable instructions. We formulate this problem as a sequential prediction task: at each step, our model reads the manual, locates the components to be added to the current shape, and infers their 3D poses. This task poses the challenge of establishing a 2D-3D correspondence between the manual image and the real 3D object, and 3D pose estimation for unseen 3D objects, since a new component to be added in a step can be an object built from previous steps. To address these two challenges, we present a novel learning-based framework, the Manual-to-Executable-Plan Network (MEPNet), which reconstructs the assembly steps from a sequence of manual images. The key idea is to integrate neural 2D keypoint detection modules and 2D-3D projection algorithms for high-precision prediction and strong generalization to unseen components. The MEPNet outperforms existing methods on three newly collected LEGO manual datasets and a Minecraft house dataset.

* ECCV 2022. Project page: https://cs.stanford.edu/~rcwang/projects/lego_manual 
Viaarxiv icon

SkexGen: Autoregressive Generation of CAD Construction Sequences with Disentangled Codebooks

Jul 11, 2022
Xiang Xu, Karl D. D. Willis, Joseph G. Lambourne, Chin-Yi Cheng, Pradeep Kumar Jayaraman, Yasutaka Furukawa

Figure 1 for SkexGen: Autoregressive Generation of CAD Construction Sequences with Disentangled Codebooks
Figure 2 for SkexGen: Autoregressive Generation of CAD Construction Sequences with Disentangled Codebooks
Figure 3 for SkexGen: Autoregressive Generation of CAD Construction Sequences with Disentangled Codebooks
Figure 4 for SkexGen: Autoregressive Generation of CAD Construction Sequences with Disentangled Codebooks

We present SkexGen, a novel autoregressive generative model for computer-aided design (CAD) construction sequences containing sketch-and-extrude modeling operations. Our model utilizes distinct Transformer architectures to encode topological, geometric, and extrusion variations of construction sequences into disentangled codebooks. Autoregressive Transformer decoders generate CAD construction sequences sharing certain properties specified by the codebook vectors. Extensive experiments demonstrate that our disentangled codebook representation generates diverse and high-quality CAD models, enhances user control, and enables efficient exploration of the design space. The code is available at https://samxuxiang.github.io/skexgen.

* Accepted to ICML 2022 
Viaarxiv icon

Multi-Target Filter and Detector for Speaker Diarization

Mar 30, 2022
Chin-Yi Cheng, Hung-Shin Lee, Yu Tsao, Hsin-Min Wang

Figure 1 for Multi-Target Filter and Detector for Speaker Diarization
Figure 2 for Multi-Target Filter and Detector for Speaker Diarization
Figure 3 for Multi-Target Filter and Detector for Speaker Diarization
Figure 4 for Multi-Target Filter and Detector for Speaker Diarization

A good representation of a target speaker usually helps to extract important information about the speaker and detect the corresponding temporal regions in a multi-speaker conversation. In this paper, we propose a neural architecture that simultaneously extracts speaker embeddings consistent with the speaker diarization objective and detects the presence of each speaker frame by frame, regardless of the number of speakers in the conversation. To this end, a residual network (ResNet) and a dual-path recurrent neural network (DPRNN) are integrated into a unified structure. When tested on the 2-speaker CALLHOME corpus, our proposed model outperforms most methods published so far. Evaluated in a more challenging case of concurrent speakers ranging from two to seven, our system also achieves relative diarization error rate reductions of 26.35% and 6.4% over two typical baselines, namely the traditional x-vector clustering system and the attention-based system.

* Submitted to Interspeech 2022 
Viaarxiv icon

CLIP-Forge: Towards Zero-Shot Text-to-Shape Generation

Oct 06, 2021
Aditya Sanghi, Hang Chu, Joseph G. Lambourne, Ye Wang, Chin-Yi Cheng, Marco Fumero

Figure 1 for CLIP-Forge: Towards Zero-Shot Text-to-Shape Generation
Figure 2 for CLIP-Forge: Towards Zero-Shot Text-to-Shape Generation
Figure 3 for CLIP-Forge: Towards Zero-Shot Text-to-Shape Generation
Figure 4 for CLIP-Forge: Towards Zero-Shot Text-to-Shape Generation

While recent progress has been made in text-to-image generation, text-to-shape generation remains a challenging problem due to the unavailability of paired text and shape data at a large scale. We present a simple yet effective method for zero-shot text-to-shape generation based on a two-stage training process, which only depends on an unlabelled shape dataset and a pre-trained image-text network such as CLIP. Our method not only demonstrates promising zero-shot generalization, but also avoids expensive inference time optimization and can generate multiple shapes for a given text.

Viaarxiv icon

Structural Design Recommendations in the Early Design Phase using Machine Learning

Jul 19, 2021
Spyridon Ampanavos, Mehdi Nourbakhsh, Chin-Yi Cheng

Figure 1 for Structural Design Recommendations in the Early Design Phase using Machine Learning
Figure 2 for Structural Design Recommendations in the Early Design Phase using Machine Learning
Figure 3 for Structural Design Recommendations in the Early Design Phase using Machine Learning
Figure 4 for Structural Design Recommendations in the Early Design Phase using Machine Learning

Structural engineering knowledge can be of significant importance to the architectural design team during the early design phase. However, architects and engineers do not typically work together during the conceptual phase; in fact, structural engineers are often called late into the process. As a result, updates in the design are more difficult and time-consuming to complete. At the same time, there is a lost opportunity for better design exploration guided by structural feedback. In general, the earlier in the design process the iteration happens, the greater the benefits in cost efficiency and informed de-sign exploration, which can lead to higher-quality creative results. In order to facilitate an informed exploration in the early design stage, we suggest the automation of fundamental structural engineering tasks and introduce ApproxiFramer, a Machine Learning-based system for the automatic generation of structural layouts from building plan sketches in real-time. The system aims to assist architects by presenting them with feasible structural solutions during the conceptual phase so that they proceed with their design with adequate knowledge of its structural implications. In this paper, we describe the system and evaluate the performance of a proof-of-concept implementation in the domain of orthogonal, metal, rigid structures. We trained a Convolutional Neural Net to iteratively generate structural design solutions for sketch-level building plans using a synthetic dataset and achieved an average error of 2.2% in the predicted positions of the columns.

* CAAD Futures 2021 
Viaarxiv icon

Building-GAN: Graph-Conditioned Architectural Volumetric Design Generation

Apr 27, 2021
Kai-Hung Chang, Chin-Yi Cheng, Jieliang Luo, Shingo Murata, Mehdi Nourbakhsh, Yoshito Tsuji

Figure 1 for Building-GAN: Graph-Conditioned Architectural Volumetric Design Generation
Figure 2 for Building-GAN: Graph-Conditioned Architectural Volumetric Design Generation
Figure 3 for Building-GAN: Graph-Conditioned Architectural Volumetric Design Generation
Figure 4 for Building-GAN: Graph-Conditioned Architectural Volumetric Design Generation

Volumetric design is the first and critical step for professional building design, where architects not only depict the rough 3D geometry of the building but also specify the programs to form a 2D layout on each floor. Though 2D layout generation for a single story has been widely studied, there is no developed method for multi-story buildings. This paper focuses on volumetric design generation conditioned on an input program graph. Instead of outputting dense 3D voxels, we propose a new 3D representation named voxel graph that is both compact and expressive for building geometries. Our generator is a cross-modal graph neural network that uses a pointer mechanism to connect the input program graph and the output voxel graph, and the whole pipeline is trained using the adversarial framework. The generated designs are evaluated qualitatively by a user study and quantitatively using three metrics: quality, diversity, and connectivity accuracy. We show that our model generates realistic 3D volumetric designs and outperforms previous methods and baselines.

Viaarxiv icon