Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Topic:photo

SplatR : Experience Goal Visual Rearrangement with 3D Gaussian Splatting and Dense Feature Matching

Nov 21, 2024

Arjun P S, Andrew Melnik, Gora Chand Nandi

Figure 1 for SplatR : Experience Goal Visual Rearrangement with 3D Gaussian Splatting and Dense Feature Matching

Figure 2 for SplatR : Experience Goal Visual Rearrangement with 3D Gaussian Splatting and Dense Feature Matching

Figure 3 for SplatR : Experience Goal Visual Rearrangement with 3D Gaussian Splatting and Dense Feature Matching

Figure 4 for SplatR : Experience Goal Visual Rearrangement with 3D Gaussian Splatting and Dense Feature Matching

Abstract:Experience Goal Visual Rearrangement task stands as a foundational challenge within Embodied AI, requiring an agent to construct a robust world model that accurately captures the goal state. The agent uses this world model to restore a shuffled scene to its original configuration, making an accurate representation of the world essential for successfully completing the task. In this work, we present a novel framework that leverages on 3D Gaussian Splatting as a 3D scene representation for experience goal visual rearrangement task. Recent advances in volumetric scene representation like 3D Gaussian Splatting, offer fast rendering of high quality and photo-realistic novel views. Our approach enables the agent to have consistent views of the current and the goal setting of the rearrangement task, which enables the agent to directly compare the goal state and the shuffled state of the world in image space. To compare these views, we propose to use a dense feature matching method with visual features extracted from a foundation model, leveraging its advantages of a more universal feature representation, which facilitates robustness, and generalization. We validate our approach on the AI2-THOR rearrangement challenge benchmark and demonstrate improvements over the current state of the art methods

Via

Access Paper or Ask Questions

Generating 3D-Consistent Videos from Unposed Internet Photos

Nov 20, 2024

Gene Chou, Kai Zhang, Sai Bi, Hao Tan, Zexiang Xu, Fujun Luan, Bharath Hariharan, Noah Snavely

Figure 1 for Generating 3D-Consistent Videos from Unposed Internet Photos

Figure 2 for Generating 3D-Consistent Videos from Unposed Internet Photos

Figure 3 for Generating 3D-Consistent Videos from Unposed Internet Photos

Figure 4 for Generating 3D-Consistent Videos from Unposed Internet Photos

Abstract:We address the problem of generating videos from unposed internet photos. A handful of input images serve as keyframes, and our model interpolates between them to simulate a path moving between the cameras. Given random images, a model's ability to capture underlying geometry, recognize scene identity, and relate frames in terms of camera position and orientation reflects a fundamental understanding of 3D structure and scene layout. However, existing video models such as Luma Dream Machine fail at this task. We design a self-supervised method that takes advantage of the consistency of videos and variability of multiview internet photos to train a scalable, 3D-aware video model without any 3D annotations such as camera parameters. We validate that our method outperforms all baselines in terms of geometric and appearance consistency. We also show our model benefits applications that enable camera control, such as 3D Gaussian Splatting. Our results suggest that we can scale up scene-level 3D learning using only 2D data such as videos and multiview internet photos.

Via

Access Paper or Ask Questions

ID-Patch: Robust ID Association for Group Photo Personalization

Nov 20, 2024

Yimeng Zhang, Tiancheng Zhi, Jing Liu, Shen Sang, Liming Jiang, Qing Yan, Sijia Liu, Linjie Luo

Abstract:The ability to synthesize personalized group photos and specify the positions of each identity offers immense creative potential. While such imagery can be visually appealing, it presents significant challenges for existing technologies. A persistent issue is identity (ID) leakage, where injected facial features interfere with one another, resulting in low face resemblance, incorrect positioning, and visual artifacts. Existing methods suffer from limitations such as the reliance on segmentation models, increased runtime, or a high probability of ID leakage. To address these challenges, we propose ID-Patch, a novel method that provides robust association between identities and 2D positions. Our approach generates an ID patch and ID embeddings from the same facial features: the ID patch is positioned on the conditional image for precise spatial control, while the ID embeddings integrate with text embeddings to ensure high resemblance. Experimental results demonstrate that ID-Patch surpasses baseline methods across metrics, such as face ID resemblance, ID-position association accuracy, and generation efficiency. Project Page is: https://byteaigc.github.io/ID-Patch/

* Project Page is: https://byteaigc.github.io/ID-Patch/

Via

Access Paper or Ask Questions

MetricGold: Leveraging Text-To-Image Latent Diffusion Models for Metric Depth Estimation

Nov 16, 2024

Ansh Shah, K Madhava Krishna

Figure 1 for MetricGold: Leveraging Text-To-Image Latent Diffusion Models for Metric Depth Estimation

Abstract:Recovering metric depth from a single image remains a fundamental challenge in computer vision, requiring both scene understanding and accurate scaling. While deep learning has advanced monocular depth estimation, current models often struggle with unfamiliar scenes and layouts, particularly in zero-shot scenarios and when predicting scale-ergodic metric depth. We present MetricGold, a novel approach that harnesses generative diffusion model's rich priors to improve metric depth estimation. Building upon recent advances in MariGold, DDVM and Depth Anything V2 respectively, our method combines latent diffusion, log-scaled metric depth representation, and synthetic data training. MetricGold achieves efficient training on a single RTX 3090 within two days using photo-realistic synthetic data from HyperSIM, VirtualKitti, and TartanAir. Our experiments demonstrate robust generalization across diverse datasets, producing sharper and higher quality metric depth estimates compared to existing approaches.

* arXiv admin note: substantial text overlap with arXiv:2312.02145 by other authors

Via

Access Paper or Ask Questions

Adversarial Attacks Using Differentiable Rendering: A Survey

Nov 14, 2024

Matthew Hull, Chao Zhang, Zsolt Kira, Duen Horng Chau

Abstract:Differentiable rendering methods have emerged as a promising means for generating photo-realistic and physically plausible adversarial attacks by manipulating 3D objects and scenes that can deceive deep neural networks (DNNs). Recently, differentiable rendering capabilities have evolved significantly into a diverse landscape of libraries, such as Mitsuba, PyTorch3D, and methods like Neural Radiance Fields and 3D Gaussian Splatting for solving inverse rendering problems that share conceptually similar properties commonly used to attack DNNs, such as back-propagation and optimization. However, the adversarial machine learning research community has not yet fully explored or understood such capabilities for generating attacks. Some key reasons are that researchers often have different attack goals, such as misclassification or misdetection, and use different tasks to accomplish these goals by manipulating different representation in a scene, such as the mesh or texture of an object. This survey adopts a task-oriented unifying framework that systematically summarizes common tasks, such as manipulating textures, altering illumination, and modifying 3D meshes to exploit vulnerabilities in DNNs. Our framework enables easy comparison of existing works, reveals research gaps and spotlights exciting future research directions in this rapidly evolving field. Through focusing on how these tasks enable attacks on various DNNs such as image classification, facial recognition, object detection, optical flow and depth estimation, our survey helps researchers and practitioners better understand the vulnerabilities of computer vision systems against photorealistic adversarial attacks that could threaten real-world applications.

Via

Access Paper or Ask Questions

PICZL: Image-based Photometric Redshifts for AGN

Nov 13, 2024

William Roster, Mara Salvato, Sven Krippendorf, Aman Saxena, Raphael Shirley, Johannes Buchner, Julien Wolf, Tom Dwelly, Franz E. Bauer, James Aird(+7 more)

Figure 1 for PICZL: Image-based Photometric Redshifts for AGN

Figure 2 for PICZL: Image-based Photometric Redshifts for AGN

Figure 3 for PICZL: Image-based Photometric Redshifts for AGN

Figure 4 for PICZL: Image-based Photometric Redshifts for AGN

Abstract:Computing photo-z for AGN is challenging, primarily due to the interplay of relative emissions associated with the SMBH and its host galaxy. SED fitting methods, effective in pencil-beam surveys, face limitations in all-sky surveys with fewer bands available, lacking the ability to capture the AGN contribution to the SED accurately. This limitation affects the many 10s of millions of AGN clearly singled out and identified by SRG/eROSITA. Our goal is to significantly enhance photometric redshift performance for AGN in all-sky surveys while avoiding the need to merge multiple data sets. Instead, we employ readily available data products from the 10th Data Release of the Imaging Legacy Survey for DESI, covering > 20,000 deg$^{2}$ with deep images and catalog-based photometry in the grizW1-W4 bands. We introduce PICZL, a machine-learning algorithm leveraging an ensemble of CNNs. Utilizing a cross-channel approach, the algorithm integrates distinct SED features from images with those obtained from catalog-level data. Full probability distributions are achieved via the integration of Gaussian mixture models. On a validation sample of 8098 AGN, PICZL achieves a variance $\sigma_{\textrm{NMAD}}$ of 4.5% with an outlier fraction $\eta$ of 5.6%, outperforming previous attempts to compute accurate photo-z for AGN using ML. We highlight that the model's performance depends on many variables, predominantly the depth of the data. A thorough evaluation of these dependencies is presented in the paper. Our streamlined methodology maintains consistent performance across the entire survey area when accounting for differing data quality. The same approach can be adopted for future deep photometric surveys such as LSST and Euclid, showcasing its potential for wide-scale realisation. With this paper, we release updated photo-z (including errors) for the XMM-SERVS W-CDF-S, ELAIS-S1 and LSS fields.

* Accepted for publication in Astronomy & Astrophysics. 24 pages, 21 figures

Via

Access Paper or Ask Questions

MBA-SLAM: Motion Blur Aware Dense Visual SLAM with Radiance Fields Representation

Nov 13, 2024

Peng Wang, Lingzhe Zhao, Yin Zhang, Shiyu Zhao, Peidong Liu

Figure 1 for MBA-SLAM: Motion Blur Aware Dense Visual SLAM with Radiance Fields Representation

Figure 2 for MBA-SLAM: Motion Blur Aware Dense Visual SLAM with Radiance Fields Representation

Figure 3 for MBA-SLAM: Motion Blur Aware Dense Visual SLAM with Radiance Fields Representation

Figure 4 for MBA-SLAM: Motion Blur Aware Dense Visual SLAM with Radiance Fields Representation

Abstract:Emerging 3D scene representations, such as Neural Radiance Fields (NeRF) and 3D Gaussian Splatting (3DGS), have demonstrated their effectiveness in Simultaneous Localization and Mapping (SLAM) for photo-realistic rendering, particularly when using high-quality video sequences as input. However, existing methods struggle with motion-blurred frames, which are common in real-world scenarios like low-light or long-exposure conditions. This often results in a significant reduction in both camera localization accuracy and map reconstruction quality. To address this challenge, we propose a dense visual SLAM pipeline (i.e. MBA-SLAM) to handle severe motion-blurred inputs. Our approach integrates an efficient motion blur-aware tracker with either neural radiance fields or Gaussian Splatting based mapper. By accurately modeling the physical image formation process of motion-blurred images, our method simultaneously learns 3D scene representation and estimates the cameras' local trajectory during exposure time, enabling proactive compensation for motion blur caused by camera movement. In our experiments, we demonstrate that MBA-SLAM surpasses previous state-of-the-art methods in both camera localization and map reconstruction, showcasing superior performance across a range of datasets, including synthetic and real datasets featuring sharp images as well as those affected by motion blur, highlighting the versatility and robustness of our approach. Code is available at https://github.com/WU-CVGL/MBA-SLAM.

Via

Access Paper or Ask Questions

Extreme Rotation Estimation in the Wild

Nov 12, 2024

Hana Bezalel, Dotan Ankri, Ruojin Cai, Hadar Averbuch-Elor

Abstract:We present a technique and benchmark dataset for estimating the relative 3D orientation between a pair of Internet images captured in an extreme setting, where the images have limited or non-overlapping field of views. Prior work targeting extreme rotation estimation assume constrained 3D environments and emulate perspective images by cropping regions from panoramic views. However, real images captured in the wild are highly diverse, exhibiting variation in both appearance and camera intrinsics. In this work, we propose a Transformer-based method for estimating relative rotations in extreme real-world settings, and contribute the ExtremeLandmarkPairs dataset, assembled from scene-level Internet photo collections. Our evaluation demonstrates that our approach succeeds in estimating the relative rotations in a wide variety of extreme-view Internet image pairs, outperforming various baselines, including dedicated rotation estimation techniques and contemporary 3D reconstruction methods.

* Project webpage: https://tau-vailab.github.io/ExtremeRotationsInTheWild/

Via

Access Paper or Ask Questions

GPU-Accelerated Inverse Lithography Towards High Quality Curvy Mask Generation

Nov 11, 2024

Haoyu Yang, Haoxing Ren

Figure 1 for GPU-Accelerated Inverse Lithography Towards High Quality Curvy Mask Generation

Figure 2 for GPU-Accelerated Inverse Lithography Towards High Quality Curvy Mask Generation

Figure 3 for GPU-Accelerated Inverse Lithography Towards High Quality Curvy Mask Generation

Figure 4 for GPU-Accelerated Inverse Lithography Towards High Quality Curvy Mask Generation

Abstract:Inverse Lithography Technology (ILT) has emerged as a promising solution for photo mask design and optimization. Relying on multi-beam mask writers, ILT enables the creation of free-form curvilinear mask shapes that enhance printed wafer image quality and process window. However, a major challenge in implementing curvilinear ILT for large-scale production is mask rule checking, an area currently under development by foundries and EDA vendors. Although recent research has incorporated mask complexity into the optimization process, much of it focuses on reducing e-beam shots, which does not align with the goals of curvilinear ILT. In this paper, we introduce a GPU-accelerated ILT algorithm that improves not only contour quality and process window but also the precision of curvilinear mask shapes. Our experiments on open benchmarks demonstrate a significant advantage of our algorithm over leading academic ILT engines.

* 10 pages, 5 figures, Accepted by International Symposium on Physical Design (ISPD), 2025, Austin TX

Via

Access Paper or Ask Questions

Instance Performance Difference: A Metric to Measure the Sim-To-Real Gap in Camera Simulation

Nov 11, 2024

Bo-Hsun Chen, Dan Negrut

Figure 1 for Instance Performance Difference: A Metric to Measure the Sim-To-Real Gap in Camera Simulation

Figure 2 for Instance Performance Difference: A Metric to Measure the Sim-To-Real Gap in Camera Simulation

Figure 3 for Instance Performance Difference: A Metric to Measure the Sim-To-Real Gap in Camera Simulation

Figure 4 for Instance Performance Difference: A Metric to Measure the Sim-To-Real Gap in Camera Simulation

Abstract:In this contribution, we introduce the concept of Instance Performance Difference (IPD), a metric designed to measure the gap in performance that a robotics perception task experiences when working with real vs. synthetic pictures. By pairing synthetic and real instances in the pictures and evaluating their performance similarity using perception algorithms, IPD provides a targeted metric that closely aligns with the needs of real-world applications. We explain and demonstrate this metric through a rock detection task in lunar terrain images, highlighting the IPD's effectiveness in identifying the most realistic image synthesis method. The metric is thus instrumental in creating synthetic image datasets that perform in perception tasks like real-world photo counterparts. In turn, this supports robust sim-to-real transfer for perception algorithms in real-world robotics applications.

* 4 pages, 3 figures, 1 table

Via

Access Paper or Ask Questions

Topic:photo

Papers and Code