Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Topic:Room Layout Estimation

What is Room Layout Estimation? Room-layout estimation is the process of estimating the layout of a room from images or videos.

3D scene generation from scene graphs and self-attention

Apr 04, 2024

Pietro Bonazzi

Abstract:Synthesizing realistic and diverse indoor 3D scene layouts in a controllable fashion opens up applications in simulated navigation and virtual reality. As concise and robust representations of a scene, scene graphs have proven to be well-suited as the semantic control on the generated layout. We present a variant of the conditional variational autoencoder (cVAE) model to synthesize 3D scenes from scene graphs and floor plans. We exploit the properties of self-attention layers to capture high-level relationships between objects in a scene, and use these as the building blocks of our model. Our model, leverages graph transformers to estimate the size, dimension and orientation of the objects in a room while satisfying relationships in the given scene graph. Our experiments shows self-attention layers leads to sparser (7.9x compared to Graphto3D) and more diverse scenes (16%).

Via

Access Paper or Ask Questions

U2RLE: Uncertainty-Guided 2-Stage Room Layout Estimation

Apr 17, 2023

Pooya Fayyazsanavi, Zhiqiang Wan, Will Hutchcroft, Ivaylo Boyadzhiev, Yuguang Li, Jana Kosecka, Sing Bing Kang

Figure 1 for U2RLE: Uncertainty-Guided 2-Stage Room Layout Estimation

Figure 2 for U2RLE: Uncertainty-Guided 2-Stage Room Layout Estimation

Figure 3 for U2RLE: Uncertainty-Guided 2-Stage Room Layout Estimation

Figure 4 for U2RLE: Uncertainty-Guided 2-Stage Room Layout Estimation

Abstract:While the existing deep learning-based room layout estimation techniques demonstrate good overall accuracy, they are less effective for distant floor-wall boundary. To tackle this problem, we propose a novel uncertainty-guided approach for layout boundary estimation introducing new two-stage CNN architecture termed U2RLE. The initial stage predicts both floor-wall boundary and its uncertainty and is followed by the refinement of boundaries with high positional uncertainty using a different, distance-aware loss. Finally, outputs from the two stages are merged to produce the room layout. Experiments using ZInD and Structure3D datasets show that U2RLE improves over current state-of-the-art, being able to handle both near and far walls better. In particular, U2RLE outperforms current state-of-the-art techniques for the most distant walls.

* To be Appear on CVPR 2023

Via

Access Paper or Ask Questions

SPVLoc: Semantic Panoramic Viewport Matching for 6D Camera Localization in Unseen Environments

Apr 16, 2024

Niklas Gard, Anna Hilsmann, Peter Eisert

Abstract:In this paper, we present SPVLoc, a global indoor localization method that accurately determines the six-dimensional (6D) camera pose of a query image and requires minimal scene-specific prior knowledge and no scene-specific training. Our approach employs a novel matching procedure to localize the perspective camera's viewport, given as an RGB image, within a set of panoramic semantic layout representations of the indoor environment. The panoramas are rendered from an untextured 3D reference model, which only comprises approximate structural information about room shapes, along with door and window annotations. We demonstrate that a straightforward convolutional network structure can successfully achieve image-to-panorama and ultimately image-to-model matching. Through a viewport classification score, we rank reference panoramas and select the best match for the query image. Then, a 6D relative pose is estimated between the chosen panorama and query image. Our experiments demonstrate that this approach not only efficiently bridges the domain gap but also generalizes well to previously unseen scenes that are not part of the training data. Moreover, it achieves superior localization accuracy compared to the state of the art methods and also estimates more degrees of freedom of the camera pose. We will make our source code publicly available at https://github.com/fraunhoferhhi/spvloc .

* This submission includes the paper and supplementary material. 24 pages, 11 figures

Via

Access Paper or Ask Questions

Shape-Net: Room Layout Estimation from Panoramic Images Robust to Occlusion using Knowledge Distillation with 3D Shapes as Additional Inputs

Apr 25, 2023

Mizuki Tabata, Kana Kurata, Junichiro Tamamatsu

Figure 1 for Shape-Net: Room Layout Estimation from Panoramic Images Robust to Occlusion using Knowledge Distillation with 3D Shapes as Additional Inputs

Figure 2 for Shape-Net: Room Layout Estimation from Panoramic Images Robust to Occlusion using Knowledge Distillation with 3D Shapes as Additional Inputs

Figure 3 for Shape-Net: Room Layout Estimation from Panoramic Images Robust to Occlusion using Knowledge Distillation with 3D Shapes as Additional Inputs

Figure 4 for Shape-Net: Room Layout Estimation from Panoramic Images Robust to Occlusion using Knowledge Distillation with 3D Shapes as Additional Inputs

Abstract:Estimating the layout of a room from a single-shot panoramic image is important in virtual/augmented reality and furniture layout simulation. This involves identifying three-dimensional (3D) geometry, such as the location of corners and boundaries, and performing 3D reconstruction. However, occlusion is a common issue that can negatively impact room layout estimation, and this has not been thoroughly studied to date. It is possible to obtain 3D shape information of rooms as drawings of buildings and coordinates of corners from image datasets, thus we propose providing both 2D panoramic and 3D information to a model to effectively deal with occlusion. However, simply feeding 3D information to a model is not sufficient to utilize the shape information for an occluded area. Therefore, we improve the model by introducing 3D Intersection over Union (IoU) loss to effectively use 3D information. In some cases, drawings are not available or the construction deviates from a drawing. Considering such practical cases, we propose a method for distilling knowledge from a model trained with both images and 3D information to a model that takes only images as input. The proposed model, which is called Shape-Net, achieves state-of-the-art (SOTA) performance on benchmark datasets. We also confirmed its effectiveness in dealing with occlusion through significantly improved accuracy on images with occlusion compared with existing models.

* Accepted by CVPR2023 workshop (CIVILS)

Via

Access Paper or Ask Questions

Disentangling Orthogonal Planes for Indoor Panoramic Room Layout Estimation with Cross-Scale Distortion Awareness

Mar 04, 2023

Zhijie Shen, Zishuo Zheng, Chunyu Lin, Lang Nie, Kang Liao, Shuai Zheng, Yao Zhao

Abstract:Based on the Manhattan World assumption, most existing indoor layout estimation schemes focus on recovering layouts from vertically compressed 1D sequences. However, the compression procedure confuses the semantics of different planes, yielding inferior performance with ambiguous interpretability. To address this issue, we propose to disentangle this 1D representation by pre-segmenting orthogonal (vertical and horizontal) planes from a complex scene, explicitly capturing the geometric cues for indoor layout estimation. Considering the symmetry between the floor boundary and ceiling boundary, we also design a soft-flipping fusion strategy to assist the pre-segmentation. Besides, we present a feature assembling mechanism to effectively integrate shallow and deep features with distortion distribution awareness. To compensate for the potential errors in pre-segmentation, we further leverage triple attention to reconstruct the disentangled sequences for better performance. Experiments on four popular benchmarks demonstrate our superiority over existing SoTA solutions, especially on the 3DIoU metric. The code is available at \url{https://github.com/zhijieshen-bjtu/DOPNet}.

* Accepted to CVPR2023

Via

Access Paper or Ask Questions

From Semi-supervised to Omni-supervised Room Layout Estimation Using Point Clouds

Jan 31, 2023

Huan-ang Gao, Beiwen Tian, Pengfei Li, Xiaoxue Chen, Hao Zhao, Guyue Zhou, Yurong Chen, Hongbin Zha

Abstract:Room layout estimation is a long-existing robotic vision task that benefits both environment sensing and motion planning. However, layout estimation using point clouds (PCs) still suffers from data scarcity due to annotation difficulty. As such, we address the semi-supervised setting of this task based upon the idea of model exponential moving averaging. But adapting this scheme to the state-of-the-art (SOTA) solution for PC-based layout estimation is not straightforward. To this end, we define a quad set matching strategy and several consistency losses based upon metrics tailored for layout quads. Besides, we propose a new online pseudo-label harvesting algorithm that decomposes the distribution of a hybrid distance measure between quads and PC into two components. This technique does not need manual threshold selection and intuitively encourages quads to align with reliable layout points. Surprisingly, this framework also works for the fully-supervised setting, achieving a new SOTA on the ScanNet benchmark. Last but not least, we also push the semi-supervised setting to the realistic omni-supervised setting, demonstrating significantly promoted performance on a newly annotated ARKitScenes testing set. Our codes, data and models are released in this repository.

* Accepted to ICRA2023. Code: https://github.com/AIR-DISCOVER/Omni-PQ

Via

Access Paper or Ask Questions

PanoViT: Vision Transformer for Room Layout Estimation from a Single Panoramic Image

Dec 23, 2022

Weichao Shen, Yuan Dong, Zonghao Chen, Zhengyi Zhao, Yang Gao, Zhu Liu

Figure 1 for PanoViT: Vision Transformer for Room Layout Estimation from a Single Panoramic Image

Figure 2 for PanoViT: Vision Transformer for Room Layout Estimation from a Single Panoramic Image

Figure 3 for PanoViT: Vision Transformer for Room Layout Estimation from a Single Panoramic Image

Figure 4 for PanoViT: Vision Transformer for Room Layout Estimation from a Single Panoramic Image

Abstract:In this paper, we propose PanoViT, a panorama vision transformer to estimate the room layout from a single panoramic image. Compared to CNN models, our PanoViT is more proficient in learning global information from the panoramic image for the estimation of complex room layouts. Considering the difference between a perspective image and an equirectangular image, we design a novel recurrent position embedding and a patch sampling method for the processing of panoramic images. In addition to extracting global information, PanoViT also includes a frequency-domain edge enhancement module and a 3D loss to extract local geometric features in a panoramic image. Experimental results on several datasets demonstrate that our method outperforms state-of-the-art solutions in room layout prediction accuracy.

Via

Access Paper or Ask Questions

EchoScan: Scanning Complex Indoor Geometries via Acoustic Echoes

Oct 18, 2023

Inmo Yeon, Iljoo Jeong, Seungchul Lee, Jung-Woo Choi

Figure 1 for EchoScan: Scanning Complex Indoor Geometries via Acoustic Echoes

Figure 2 for EchoScan: Scanning Complex Indoor Geometries via Acoustic Echoes

Figure 3 for EchoScan: Scanning Complex Indoor Geometries via Acoustic Echoes

Figure 4 for EchoScan: Scanning Complex Indoor Geometries via Acoustic Echoes

Abstract:Accurate estimation of indoor space geometries is vital for constructing precise digital twins, whose broad industrial applications include navigation in unfamiliar environments and efficient evacuation planning, particularly in low-light conditions. This study introduces EchoScan, a deep neural network model that utilizes acoustic echoes to perform room geometry inference. Conventional sound-based techniques rely on estimating geometry-related room parameters such as wall position and room size, thereby limiting the diversity of inferable room geometries. Contrarily, EchoScan overcomes this limitation by directly inferring room floorplans and heights, thereby enabling it to handle rooms with arbitrary shapes, including curved walls. The key innovation of EchoScan is its ability to analyze the complex relationship between low- and high-order reflections in room impulse responses (RIRs) using a multi-aggregation module. The analysis of high-order reflections also enables it to infer complex room shapes when echoes are unobservable from the position of an audio device. Herein, EchoScan was trained and evaluated using RIRs synthesized from complex environments, including the Manhattan and Atlanta layouts, employing a practical audio device configuration compatible with commercial, off-the-shelf devices. Compared with vision-based methods, EchoScan demonstrated outstanding geometry estimation performance in rooms with various shapes.

* 9 pages, 8 figures

Via

Access Paper or Ask Questions

Language-EXtended Indoor SLAM (LEXIS): A Versatile System for Real-time Visual Scene Understanding

Sep 26, 2023

Christina Kassab, Matias Mattamala, Lintong Zhang, Maurice Fallon

Figure 1 for Language-EXtended Indoor SLAM (LEXIS): A Versatile System for Real-time Visual Scene Understanding

Figure 2 for Language-EXtended Indoor SLAM (LEXIS): A Versatile System for Real-time Visual Scene Understanding

Figure 3 for Language-EXtended Indoor SLAM (LEXIS): A Versatile System for Real-time Visual Scene Understanding

Figure 4 for Language-EXtended Indoor SLAM (LEXIS): A Versatile System for Real-time Visual Scene Understanding

Abstract:Versatile and adaptive semantic understanding would enable autonomous systems to comprehend and interact with their surroundings. Existing fixed-class models limit the adaptability of indoor mobile and assistive autonomous systems. In this work, we introduce LEXIS, a real-time indoor Simultaneous Localization and Mapping (SLAM) system that harnesses the open-vocabulary nature of Large Language Models (LLMs) to create a unified approach to scene understanding and place recognition. The approach first builds a topological SLAM graph of the environment (using visual-inertial odometry) and embeds Contrastive Language-Image Pretraining (CLIP) features in the graph nodes. We use this representation for flexible room classification and segmentation, serving as a basis for room-centric place recognition. This allows loop closure searches to be directed towards semantically relevant places. Our proposed system is evaluated using both public, simulated data and real-world data, covering office and home environments. It successfully categorizes rooms with varying layouts and dimensions and outperforms the state-of-the-art (SOTA). For place recognition and trajectory estimation tasks we achieve equivalent performance to the SOTA, all also utilizing the same pre-trained model. Lastly, we demonstrate the system's potential for planning.

Via

Access Paper or Ask Questions

UAV-assisted Visual SLAM Generating Reconstructed 3D Scene Graphs in GPS-denied Environments

Feb 12, 2024

Ahmed Radwan, Ali Tourani, Hriday Bavle, Holger Voos, Jose Luis Sanchez-Lopez

Figure 1 for UAV-assisted Visual SLAM Generating Reconstructed 3D Scene Graphs in GPS-denied Environments

Figure 2 for UAV-assisted Visual SLAM Generating Reconstructed 3D Scene Graphs in GPS-denied Environments

Figure 3 for UAV-assisted Visual SLAM Generating Reconstructed 3D Scene Graphs in GPS-denied Environments

Figure 4 for UAV-assisted Visual SLAM Generating Reconstructed 3D Scene Graphs in GPS-denied Environments

Abstract:Aerial robots play a vital role in various applications where the situational awareness of the robots concerning the environment is a fundamental demand. As one such use case, drones in GPS-denied environments require equipping with different sensors (e.g., vision sensors) that provide reliable sensing results while performing pose estimation and localization. In this paper, reconstructing the maps of indoor environments alongside generating 3D scene graphs for a high-level representation using a camera mounted on a drone is targeted. Accordingly, an aerial robot equipped with a companion computer and an RGB-D camera was built and employed to be appropriately integrated with a Visual Simultaneous Localization and Mapping (VSLAM) framework proposed by the authors. To enhance the situational awareness of the robot while reconstructing maps, various structural elements, including doors and walls, were labeled with printed fiducial markers, and a dictionary of the topological relations among them was fed to the system. The VSLAM system detects markers and reconstructs the map of the indoor areas enriched with higher-level semantic entities, including corridors and rooms. Another achievement is generating multi-layered vision-based situational graphs containing enhanced hierarchical representations of the indoor environment. In this regard, integrating VSLAM into the employed drone is the primary target of this paper to provide an end-to-end robot application for GPS-denied environments. To show the practicality of the system, various real-world condition experiments have been conducted in indoor scenarios with dissimilar structural layouts. Evaluations show the proposed drone application can perform adequately w.r.t. the ground-truth data and its baseline.

* 8 pages, 7 figures, 3 tables

Via

Access Paper or Ask Questions

Topic:Room Layout Estimation

Papers and Code