Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Yasutaka Furukawa

Hierarchical Neural Memory Network for Low Latency Event Processing

May 29, 2023

Ryuhei Hamaguchi, Yasutaka Furukawa, Masaki Onishi, Ken Sakurada

Figure 1 for Hierarchical Neural Memory Network for Low Latency Event Processing

Figure 2 for Hierarchical Neural Memory Network for Low Latency Event Processing

Figure 3 for Hierarchical Neural Memory Network for Low Latency Event Processing

Figure 4 for Hierarchical Neural Memory Network for Low Latency Event Processing

Abstract:This paper proposes a low latency neural network architecture for event-based dense prediction tasks. Conventional architectures encode entire scene contents at a fixed rate regardless of their temporal characteristics. Instead, the proposed network encodes contents at a proper temporal scale depending on its movement speed. We achieve this by constructing temporal hierarchy using stacked latent memories that operate at different rates. Given low latency event steams, the multi-level memories gradually extract dynamic to static scene contents by propagating information from the fast to the slow memory modules. The architecture not only reduces the redundancy of conventional architectures but also exploits long-term dependencies. Furthermore, an attention-based event representation efficiently encodes sparse event streams into the memory cells. We conduct extensive evaluations on three event-based dense prediction tasks, where the proposed approach outperforms the existing methods on accuracy and latency, while demonstrating effective event and image fusion capabilities. The code is available at https://hamarh.github.io/hmnet/

* Accepted to CVPR 2023

Via

Access Paper or Ask Questions

JigsawPlan: Room Layout Jigsaw Puzzle Extreme Structure from Motion using Diffusion Models

Nov 24, 2022

Sepidehsadat Hosseini, Mohammad Amin Shabani, Saghar Irandoust, Yasutaka Furukawa

Figure 1 for JigsawPlan: Room Layout Jigsaw Puzzle Extreme Structure from Motion using Diffusion Models

Figure 2 for JigsawPlan: Room Layout Jigsaw Puzzle Extreme Structure from Motion using Diffusion Models

Figure 3 for JigsawPlan: Room Layout Jigsaw Puzzle Extreme Structure from Motion using Diffusion Models

Figure 4 for JigsawPlan: Room Layout Jigsaw Puzzle Extreme Structure from Motion using Diffusion Models

Abstract:This paper presents a novel approach to the Extreme Structure from Motion (E-SfM) problem, which takes a set of room layouts as polygonal curves in the top-down view, and aligns the room layout pieces by estimating their 2D translations and rotations, akin to solving the jigsaw puzzle of room layouts. The biggest discovery and surprise of the paper is that the simple use of a Diffusion Model solves this challenging registration problem as a conditional generation process. The paper presents a new dataset of room layouts and floorplans for 98,780 houses. The qualitative and quantitative evaluations demonstrate that the proposed approach outperforms the competing methods by significant margins.

Via

Access Paper or Ask Questions

HouseDiffusion: Vector Floorplan Generation via a Diffusion Model with Discrete and Continuous Denoising

Nov 23, 2022

Mohammad Amin Shabani, Sepidehsadat Hosseini, Yasutaka Furukawa

Figure 1 for HouseDiffusion: Vector Floorplan Generation via a Diffusion Model with Discrete and Continuous Denoising

Figure 2 for HouseDiffusion: Vector Floorplan Generation via a Diffusion Model with Discrete and Continuous Denoising

Figure 3 for HouseDiffusion: Vector Floorplan Generation via a Diffusion Model with Discrete and Continuous Denoising

Figure 4 for HouseDiffusion: Vector Floorplan Generation via a Diffusion Model with Discrete and Continuous Denoising

Abstract:The paper presents a novel approach for vector-floorplan generation via a diffusion model, which denoises 2D coordinates of room/door corners with two inference objectives: 1) a single-step noise as the continuous quantity to precisely invert the continuous forward process; and 2) the final 2D coordinate as the discrete quantity to establish geometric incident relationships such as parallelism, orthogonality, and corner-sharing. Our task is graph-conditioned floorplan generation, a common workflow in floorplan design. We represent a floorplan as 1D polygonal loops, each of which corresponds to a room or a door. Our diffusion model employs a Transformer architecture at the core, which controls the attention masks based on the input graph-constraint and directly generates vector-graphics floorplans via a discrete and continuous denoising process. We have evaluated our approach on RPLAN dataset. The proposed approach makes significant improvements in all the metrics against the state-of-the-art with significant margins, while being capable of generating non-Manhattan structures and controlling the exact number of corners per room. A project website with supplementary video and document is here https://aminshabani.github.io/housediffusion.

Via

Access Paper or Ask Questions

NeuMap: Neural Coordinate Mapping by Auto-Transdecoder for Camera Localization

Nov 21, 2022

Shitao Tang, Sicong Tang, Andrea Tagliasacchi, Ping Tan, Yasutaka Furukawa

Figure 1 for NeuMap: Neural Coordinate Mapping by Auto-Transdecoder for Camera Localization

Figure 2 for NeuMap: Neural Coordinate Mapping by Auto-Transdecoder for Camera Localization

Figure 3 for NeuMap: Neural Coordinate Mapping by Auto-Transdecoder for Camera Localization

Figure 4 for NeuMap: Neural Coordinate Mapping by Auto-Transdecoder for Camera Localization

Abstract:This paper presents an end-to-end neural mapping method for camera localization, encoding a whole scene into a grid of latent codes, with which a Transformer-based auto-decoder regresses 3D coordinates of query pixels. State-of-the-art camera localization methods require each scene to be stored as a 3D point cloud with per-point features, which takes several gigabytes of storage per scene. While compression is possible, the performance drops significantly at high compression rates. NeuMap achieves extremely high compression rates with minimal performance drop by using 1) learnable latent codes to store scene information and 2) a scene-agnostic Transformer-based auto-decoder to infer coordinates for a query pixel. The scene-agnostic network design also learns robust matching priors by training with large-scale data, and further allows us to just optimize the codes quickly for a new scene while fixing the network weights. Extensive evaluations with five benchmarks show that NeuMap outperforms all the other coordinate regression methods significantly and reaches similar performance as the feature matching methods while having a much smaller scene representation size. For example, NeuMap achieves 39.1% accuracy in Aachen night benchmark with only 6MB of data, while other compelling methods require 100MB or a few gigabytes and fail completely under high compression settings. The codes are available at https://github.com/Tangshitao/NeuMap.

Via

Access Paper or Ask Questions

SkexGen: Autoregressive Generation of CAD Construction Sequences with Disentangled Codebooks

Jul 11, 2022

Xiang Xu, Karl D. D. Willis, Joseph G. Lambourne, Chin-Yi Cheng, Pradeep Kumar Jayaraman, Yasutaka Furukawa

Figure 1 for SkexGen: Autoregressive Generation of CAD Construction Sequences with Disentangled Codebooks

Figure 2 for SkexGen: Autoregressive Generation of CAD Construction Sequences with Disentangled Codebooks

Figure 3 for SkexGen: Autoregressive Generation of CAD Construction Sequences with Disentangled Codebooks

Figure 4 for SkexGen: Autoregressive Generation of CAD Construction Sequences with Disentangled Codebooks

Abstract:We present SkexGen, a novel autoregressive generative model for computer-aided design (CAD) construction sequences containing sketch-and-extrude modeling operations. Our model utilizes distinct Transformer architectures to encode topological, geometric, and extrusion variations of construction sequences into disentangled codebooks. Autoregressive Transformer decoders generate CAD construction sequences sharing certain properties specified by the codebook vectors. Extensive experiments demonstrate that our disentangled codebook representation generates diverse and high-quality CAD models, enhances user control, and enables efficient exploration of the design space. The code is available at https://samxuxiang.github.io/skexgen.

* Accepted to ICML 2022

Via

Access Paper or Ask Questions

Extreme Floorplan Reconstruction by Structure-Hallucinating Transformer Cascades

Jun 01, 2022

Sepidehsadat Hosseini, Yasutaka Furukawa

Figure 1 for Extreme Floorplan Reconstruction by Structure-Hallucinating Transformer Cascades

Figure 2 for Extreme Floorplan Reconstruction by Structure-Hallucinating Transformer Cascades

Figure 3 for Extreme Floorplan Reconstruction by Structure-Hallucinating Transformer Cascades

Figure 4 for Extreme Floorplan Reconstruction by Structure-Hallucinating Transformer Cascades

Abstract:This paper presents an extreme floorplan reconstruction task, a new benchmark for the task, and a neural architecture as a solution. Given a partial floorplan reconstruction inferred or curated from panorama images, the task is to reconstruct a complete floorplan including invisible architectural structures. The proposed neural network 1) encodes an input partial floorplan into a set of latent vectors by convolutional neural networks and a Transformer; and 2) reconstructs an entire floorplan while hallucinating invisible rooms and doors by cascading Transformer decoders. Qualitative and quantitative evaluations demonstrate effectiveness of our approach over the benchmark of 701 houses, outperforming the state-of-the-art reconstruction techniques. We will share our code, models, and data.

Via

Access Paper or Ask Questions

Neural Inertial Localization

Mar 29, 2022

Sachini Herath, David Caruso, Chen Liu, Yufan Chen, Yasutaka Furukawa

Figure 1 for Neural Inertial Localization

Figure 2 for Neural Inertial Localization

Abstract:This paper proposes the inertial localization problem, the task of estimating the absolute location from a sequence of inertial sensor measurements. This is an exciting and unexplored area of indoor localization research, where we present a rich dataset with 53 hours of inertial sensor data and the associated ground truth locations. We developed a solution, dubbed neural inertial localization (NILoc) which 1) uses a neural inertial navigation technique to turn inertial sensor history to a sequence of velocity vectors; then 2) employs a transformer-based neural architecture to find the device location from the sequence of velocities. We only use an IMU sensor, which is energy efficient and privacy preserving compared to WiFi, cameras, and other data sources. Our approach is significantly faster and achieves competitive results even compared with state-of-the-art methods that require a floorplan and run 20 to 30 times slower. We share our code, model and data at https://sachini.github.io/niloc.

Via

Access Paper or Ask Questions

HEAT: Holistic Edge Attention Transformer for Structured Reconstruction

Nov 30, 2021

Jiacheng Chen, Yiming Qian, Yasutaka Furukawa

Figure 1 for HEAT: Holistic Edge Attention Transformer for Structured Reconstruction

Figure 2 for HEAT: Holistic Edge Attention Transformer for Structured Reconstruction

Figure 3 for HEAT: Holistic Edge Attention Transformer for Structured Reconstruction

Figure 4 for HEAT: Holistic Edge Attention Transformer for Structured Reconstruction

Abstract:This paper presents a novel attention-based neural network for structured reconstruction, which takes a 2D raster image as an input and reconstructs a planar graph depicting an underlying geometric structure. The approach detects corners and classifies edge candidates between corners in an end-to-end manner. Our contribution is a holistic edge classification architecture, which 1) initializes the feature of an edge candidate by a trigonometric positional encoding of its end-points; 2) fuses image feature to each edge candidate by deformable attention; 3) employs two weight-sharing Transformer decoders to learn holistic structural patterns over the graph edge candidates; and 4) is trained with a masked learning strategy. The corner detector is a variant of the edge classification architecture, adapted to operate on pixels as corner candidates. We conduct experiments on two structured reconstruction tasks: outdoor building architecture and indoor floorplan planar graph reconstruction. Extensive qualitative and quantitative evaluations demonstrate the superiority of our approach over the state of the art. We will share code and models.

Via

Access Paper or Ask Questions

Structured Outdoor Architecture Reconstruction by Exploration and Classification

Aug 18, 2021

Fuyang Zhang, Xiang Xu, Nelson Nauata, Yasutaka Furukawa

Figure 1 for Structured Outdoor Architecture Reconstruction by Exploration and Classification

Figure 2 for Structured Outdoor Architecture Reconstruction by Exploration and Classification

Figure 3 for Structured Outdoor Architecture Reconstruction by Exploration and Classification

Figure 4 for Structured Outdoor Architecture Reconstruction by Exploration and Classification

Abstract:This paper presents an explore-and-classify framework for structured architectural reconstruction from an aerial image. Starting from a potentially imperfect building reconstruction by an existing algorithm, our approach 1) explores the space of building models by modifying the reconstruction via heuristic actions; 2) learns to classify the correctness of building models while generating classification labels based on the ground-truth, and 3) repeat. At test time, we iterate exploration and classification, seeking for a result with the best classification score. We evaluate the approach using initial reconstructions by two baselines and two state-of-the-art reconstruction algorithms. Qualitative and quantitative evaluations demonstrate that our approach consistently improves the reconstruction quality from every initial reconstruction.

* 2021 International Conference on Computer Vision (ICCV 2021)

Via

Access Paper or Ask Questions

Plan2Scene: Converting Floorplans to 3D Scenes

Jun 09, 2021

Madhawa Vidanapathirana, Qirui Wu, Yasutaka Furukawa, Angel X. Chang, Manolis Savva

Figure 1 for Plan2Scene: Converting Floorplans to 3D Scenes

Figure 2 for Plan2Scene: Converting Floorplans to 3D Scenes

Figure 3 for Plan2Scene: Converting Floorplans to 3D Scenes

Figure 4 for Plan2Scene: Converting Floorplans to 3D Scenes

Abstract:We address the task of converting a floorplan and a set of associated photos of a residence into a textured 3D mesh model, a task which we call Plan2Scene. Our system 1) lifts a floorplan image to a 3D mesh model; 2) synthesizes surface textures based on the input photos; and 3) infers textures for unobserved surfaces using a graph neural network architecture. To train and evaluate our system we create indoor surface texture datasets, and augment a dataset of floorplans and photos from prior work with rectified surface crops and additional annotations. Our approach handles the challenge of producing tileable textures for dominant surfaces such as floors, walls, and ceilings from a sparse set of unaligned photos that only partially cover the residence. Qualitative and quantitative evaluations show that our system produces realistic 3D interior models, outperforming baseline approaches on a suite of texture quality metrics and as measured by a holistic user study.

* This paper is accepted to CVPR 2021. For code, data and pretrained models, see https://3dlg-hcvc.github.io/plan2scene/

Via

Access Paper or Ask Questions