Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Songyou Peng

DepthSplat: Connecting Gaussian Splatting and Depth

Oct 17, 2024

Haofei Xu, Songyou Peng, Fangjinhua Wang, Hermann Blum, Daniel Barath, Andreas Geiger, Marc Pollefeys

Figure 1 for DepthSplat: Connecting Gaussian Splatting and Depth

Figure 2 for DepthSplat: Connecting Gaussian Splatting and Depth

Figure 3 for DepthSplat: Connecting Gaussian Splatting and Depth

Figure 4 for DepthSplat: Connecting Gaussian Splatting and Depth

Abstract:Gaussian splatting and single/multi-view depth estimation are typically studied in isolation. In this paper, we present DepthSplat to connect Gaussian splatting and depth estimation and study their interactions. More specifically, we first contribute a robust multi-view depth model by leveraging pre-trained monocular depth features, leading to high-quality feed-forward 3D Gaussian splatting reconstructions. We also show that Gaussian splatting can serve as an unsupervised pre-training objective for learning powerful depth models from large-scale unlabelled datasets. We validate the synergy between Gaussian splatting and depth estimation through extensive ablation and cross-task transfer experiments. Our DepthSplat achieves state-of-the-art performance on ScanNet, RealEstate10K and DL3DV datasets in terms of both depth estimation and novel view synthesis, demonstrating the mutual benefits of connecting both tasks. Our code, models, and video results are available at https://haofeixu.github.io/depthsplat/.

* Project page: https://haofeixu.github.io/depthsplat/

Via

Access Paper or Ask Questions

WildGaussians: 3D Gaussian Splatting in the Wild

Jul 11, 2024

Jonas Kulhanek, Songyou Peng, Zuzana Kukelova, Marc Pollefeys, Torsten Sattler

Figure 1 for WildGaussians: 3D Gaussian Splatting in the Wild

Figure 2 for WildGaussians: 3D Gaussian Splatting in the Wild

Figure 3 for WildGaussians: 3D Gaussian Splatting in the Wild

Figure 4 for WildGaussians: 3D Gaussian Splatting in the Wild

Abstract:While the field of 3D scene reconstruction is dominated by NeRFs due to their photorealistic quality, 3D Gaussian Splatting (3DGS) has recently emerged, offering similar quality with real-time rendering speeds. However, both methods primarily excel with well-controlled 3D scenes, while in-the-wild data - characterized by occlusions, dynamic objects, and varying illumination - remains challenging. NeRFs can adapt to such conditions easily through per-image embedding vectors, but 3DGS struggles due to its explicit representation and lack of shared parameters. To address this, we introduce WildGaussians, a novel approach to handle occlusions and appearance changes with 3DGS. By leveraging robust DINO features and integrating an appearance modeling module within 3DGS, our method achieves state-of-the-art results. We demonstrate that WildGaussians matches the real-time rendering speed of 3DGS while surpassing both 3DGS and NeRF baselines in handling in-the-wild data, all within a simple architectural framework.

* https://wild-gaussians.github.io/

Via

Access Paper or Ask Questions

OpenDAS: Domain Adaptation for Open-Vocabulary Segmentation

May 30, 2024

Gonca Yilmaz, Songyou Peng, Francis Engelmann, Marc Pollefeys, Hermann Blum

Figure 1 for OpenDAS: Domain Adaptation for Open-Vocabulary Segmentation

Figure 2 for OpenDAS: Domain Adaptation for Open-Vocabulary Segmentation

Figure 3 for OpenDAS: Domain Adaptation for Open-Vocabulary Segmentation

Figure 4 for OpenDAS: Domain Adaptation for Open-Vocabulary Segmentation

Abstract:The advent of Vision Language Models (VLMs) transformed image understanding from closed-set classifications to dynamic image-language interactions, enabling open-vocabulary segmentation. Despite this flexibility, VLMs often fall behind closed-set classifiers in accuracy due to their reliance on ambiguous image captions and lack of domain-specific knowledge. We, therefore, introduce a new task domain adaptation for open-vocabulary segmentation, enhancing VLMs with domain-specific priors while preserving their open-vocabulary nature. Existing adaptation methods, when applied to segmentation tasks, improve performance on training queries but can reduce VLM performance on zero-shot text inputs. To address this shortcoming, we propose an approach that combines parameter-efficient prompt tuning with a triplet-loss-based training strategy. This strategy is designed to enhance open-vocabulary generalization while adapting to the visual domain. Our results outperform other parameter-efficient adaptation strategies in open-vocabulary segment classification tasks across indoor and outdoor datasets. Notably, our approach is the only one that consistently surpasses the original VLM on zero-shot queries. Our adapted VLMs can be plug-and-play integrated into existing open-vocabulary segmentation pipelines, improving OV-Seg by +6.0% mIoU on ADE20K, and OpenMask3D by +4.1% AP on ScanNet++ Offices without any changes to the methods.

Via

Access Paper or Ask Questions

3D Neural Edge Reconstruction

May 29, 2024

Lei Li, Songyou Peng, Zehao Yu, Shaohui Liu, Rémi Pautrat, Xiaochuan Yin, Marc Pollefeys

Abstract:Real-world objects and environments are predominantly composed of edge features, including straight lines and curves. Such edges are crucial elements for various applications, such as CAD modeling, surface meshing, lane mapping, etc. However, existing traditional methods only prioritize lines over curves for simplicity in geometric modeling. To this end, we introduce EMAP, a new method for learning 3D edge representations with a focus on both lines and curves. Our method implicitly encodes 3D edge distance and direction in Unsigned Distance Functions (UDF) from multi-view edge maps. On top of this neural representation, we propose an edge extraction algorithm that robustly abstracts parametric 3D edges from the inferred edge points and their directions. Comprehensive evaluations demonstrate that our method achieves better 3D edge reconstruction on multiple challenging datasets. We further show that our learned UDF field enhances neural surface reconstruction by capturing more details.

* Project page: https://neural-edge-map.github.io

Via

Access Paper or Ask Questions

NeRF On-the-go: Exploiting Uncertainty for Distractor-free NeRFs in the Wild

May 29, 2024

Weining Ren, Zihan Zhu, Boyang Sun, Jiaqi Chen, Marc Pollefeys, Songyou Peng

Figure 1 for NeRF On-the-go: Exploiting Uncertainty for Distractor-free NeRFs in the Wild

Figure 2 for NeRF On-the-go: Exploiting Uncertainty for Distractor-free NeRFs in the Wild

Figure 3 for NeRF On-the-go: Exploiting Uncertainty for Distractor-free NeRFs in the Wild

Figure 4 for NeRF On-the-go: Exploiting Uncertainty for Distractor-free NeRFs in the Wild

Abstract:Neural Radiance Fields (NeRFs) have shown remarkable success in synthesizing photorealistic views from multi-view images of static scenes, but face challenges in dynamic, real-world environments with distractors like moving objects, shadows, and lighting changes. Existing methods manage controlled environments and low occlusion ratios but fall short in render quality, especially under high occlusion scenarios. In this paper, we introduce NeRF On-the-go, a simple yet effective approach that enables the robust synthesis of novel views in complex, in-the-wild scenes from only casually captured image sequences. Delving into uncertainty, our method not only efficiently eliminates distractors, even when they are predominant in captures, but also achieves a notably faster convergence speed. Through comprehensive experiments on various scenes, our method demonstrates a significant improvement over state-of-the-art techniques. This advancement opens new avenues for NeRF in diverse and dynamic real-world applications.

* CVPR 2024, first two authors contributed equally. Project Page: https://nerf-on-the-go.github.io

Via

Access Paper or Ask Questions

When LLMs step into the 3D World: A Survey and Meta-Analysis of 3D Tasks via Multi-modal Large Language Models

May 16, 2024

Xianzheng Ma, Yash Bhalgat, Brandon Smart, Shuai Chen, Xinghui Li, Jian Ding, Jindong Gu, Dave Zhenyu Chen, Songyou Peng, Jia-Wang Bian(+7 more)

Figure 1 for When LLMs step into the 3D World: A Survey and Meta-Analysis of 3D Tasks via Multi-modal Large Language Models

Figure 2 for When LLMs step into the 3D World: A Survey and Meta-Analysis of 3D Tasks via Multi-modal Large Language Models

Figure 3 for When LLMs step into the 3D World: A Survey and Meta-Analysis of 3D Tasks via Multi-modal Large Language Models

Figure 4 for When LLMs step into the 3D World: A Survey and Meta-Analysis of 3D Tasks via Multi-modal Large Language Models

Abstract:As large language models (LLMs) evolve, their integration with 3D spatial data (3D-LLMs) has seen rapid progress, offering unprecedented capabilities for understanding and interacting with physical spaces. This survey provides a comprehensive overview of the methodologies enabling LLMs to process, understand, and generate 3D data. Highlighting the unique advantages of LLMs, such as in-context learning, step-by-step reasoning, open-vocabulary capabilities, and extensive world knowledge, we underscore their potential to significantly advance spatial comprehension and interaction within embodied Artificial Intelligence (AI) systems. Our investigation spans various 3D data representations, from point clouds to Neural Radiance Fields (NeRFs). It examines their integration with LLMs for tasks such as 3D scene understanding, captioning, question-answering, and dialogue, as well as LLM-based agents for spatial reasoning, planning, and navigation. The paper also includes a brief review of other methods that integrate 3D and language. The meta-analysis presented in this paper reveals significant progress yet underscores the necessity for novel approaches to harness the full potential of 3D-LLMs. Hence, with this paper, we aim to chart a course for future research that explores and expands the capabilities of 3D-LLMs in understanding and interacting with the complex 3D world. To support this survey, we have established a project page where papers related to our topic are organized and listed: https://github.com/ActiveVisionLab/Awesome-LLM-3D.

Via

Access Paper or Ask Questions

NeRF in Robotics: A Survey

May 02, 2024

Guangming Wang, Lei Pan, Songyou Peng, Shaohui Liu, Chenfeng Xu, Yanzi Miao, Wei Zhan, Masayoshi Tomizuka, Marc Pollefeys, Hesheng Wang

Abstract:Meticulous 3D environment representations have been a longstanding goal in computer vision and robotics fields. The recent emergence of neural implicit representations has introduced radical innovation to this field as implicit representations enable numerous capabilities. Among these, the Neural Radiance Field (NeRF) has sparked a trend because of the huge representational advantages, such as simplified mathematical models, compact environment storage, and continuous scene representations. Apart from computer vision, NeRF has also shown tremendous potential in the field of robotics. Thus, we create this survey to provide a comprehensive understanding of NeRF in the field of robotics. By exploring the advantages and limitations of NeRF, as well as its current applications and future potential, we hope to shed light on this promising area of research. Our survey is divided into two main sections: \textit{The Application of NeRF in Robotics} and \textit{The Advance of NeRF in Robotics}, from the perspective of how NeRF enters the field of robotics. In the first section, we introduce and analyze some works that have been or could be used in the field of robotics from the perception and interaction perspectives. In the second section, we show some works related to improving NeRF's own properties, which are essential for deploying NeRF in the field of robotics. In the discussion section of the review, we summarize the existing challenges and provide some valuable future research directions for reference.

* 21 pages, 19 figures

Via

Access Paper or Ask Questions

Renovating Names in Open-Vocabulary Segmentation Benchmarks

Mar 14, 2024

Haiwen Huang, Songyou Peng, Dan Zhang, Andreas Geiger

Figure 1 for Renovating Names in Open-Vocabulary Segmentation Benchmarks

Figure 2 for Renovating Names in Open-Vocabulary Segmentation Benchmarks

Figure 3 for Renovating Names in Open-Vocabulary Segmentation Benchmarks

Figure 4 for Renovating Names in Open-Vocabulary Segmentation Benchmarks

Abstract:Names are essential to both human cognition and vision-language models. Open-vocabulary models utilize class names as text prompts to generalize to categories unseen during training. However, name qualities are often overlooked and lack sufficient precision in existing datasets. In this paper, we address this underexplored problem by presenting a framework for "renovating" names in open-vocabulary segmentation benchmarks (RENOVATE). Through human study, we demonstrate that the names generated by our model are more precise descriptions of the visual segments and hence enhance the quality of existing datasets by means of simple renaming. We further demonstrate that using our renovated names enables training of stronger open-vocabulary segmentation models. Using open-vocabulary segmentation for name quality evaluation, we show that our renovated names lead to up to 16% relative improvement from the original names on various benchmarks across various state-of-the-art models. We provide our code and relabelings for several popular segmentation datasets (ADE20K, Cityscapes, PASCAL Context) to the research community.

Via

Access Paper or Ask Questions

OpenSUN3D: 1st Workshop Challenge on Open-Vocabulary 3D Scene Understanding

Feb 23, 2024

Francis Engelmann, Ayca Takmaz, Jonas Schult, Elisabetta Fedele, Johanna Wald, Songyou Peng, Xi Wang, Or Litany, Siyu Tang, Federico Tombari(+18 more)

Figure 1 for OpenSUN3D: 1st Workshop Challenge on Open-Vocabulary 3D Scene Understanding

Figure 2 for OpenSUN3D: 1st Workshop Challenge on Open-Vocabulary 3D Scene Understanding

Figure 3 for OpenSUN3D: 1st Workshop Challenge on Open-Vocabulary 3D Scene Understanding

Figure 4 for OpenSUN3D: 1st Workshop Challenge on Open-Vocabulary 3D Scene Understanding

Abstract:This report provides an overview of the challenge hosted at the OpenSUN3D Workshop on Open-Vocabulary 3D Scene Understanding held in conjunction with ICCV 2023. The goal of this workshop series is to provide a platform for exploration and discussion of open-vocabulary 3D scene understanding tasks, including but not limited to segmentation, detection and mapping. We provide an overview of the challenge hosted at the workshop, present the challenge dataset, the evaluation methodology, and brief descriptions of the winning methods. For additional details, please see https://opensun3d.github.io/index_iccv23.html.

* Our OpenSUN3D workshop website for ICCV 2023: https://opensun3d.github.io/index_iccv23.html

Via

Access Paper or Ask Questions

Segment3D: Learning Fine-Grained Class-Agnostic 3D Segmentation without Manual Labels

Dec 28, 2023

Rui Huang, Songyou Peng, Ayca Takmaz, Federico Tombari, Marc Pollefeys, Shiji Song, Gao Huang, Francis Engelmann

Abstract:Current 3D scene segmentation methods are heavily dependent on manually annotated 3D training datasets. Such manual annotations are labor-intensive, and often lack fine-grained details. Importantly, models trained on this data typically struggle to recognize object classes beyond the annotated classes, i.e., they do not generalize well to unseen domains and require additional domain-specific annotations. In contrast, 2D foundation models demonstrate strong generalization and impressive zero-shot abilities, inspiring us to incorporate these characteristics from 2D models into 3D models. Therefore, we explore the use of image segmentation foundation models to automatically generate training labels for 3D segmentation. We propose Segment3D, a method for class-agnostic 3D scene segmentation that produces high-quality 3D segmentation masks. It improves over existing 3D segmentation models (especially on fine-grained masks), and enables easily adding new training data to further boost the segmentation performance -- all without the need for manual training labels.

* Project Page: http://segment3d.github.io

Via

Access Paper or Ask Questions