Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Kostas Daniilidis

Track Everything Everywhere Fast and Robustly

Mar 26, 2024

Yunzhou Song, Jiahui Lei, Ziyun Wang, Lingjie Liu, Kostas Daniilidis

Figure 1 for Track Everything Everywhere Fast and Robustly

Figure 2 for Track Everything Everywhere Fast and Robustly

Figure 3 for Track Everything Everywhere Fast and Robustly

Figure 4 for Track Everything Everywhere Fast and Robustly

Abstract:We propose a novel test-time optimization approach for efficiently and robustly tracking any pixel at any time in a video. The latest state-of-the-art optimization-based tracking technique, OmniMotion, requires a prohibitively long optimization time, rendering it impractical for downstream applications. OmniMotion is sensitive to the choice of random seeds, leading to unstable convergence. To improve efficiency and robustness, we introduce a novel invertible deformation network, CaDeX++, which factorizes the function representation into a local spatial-temporal feature grid and enhances the expressivity of the coupling blocks with non-linear functions. While CaDeX++ incorporates a stronger geometric bias within its architectural design, it also takes advantage of the inductive bias provided by the vision foundation models. Our system utilizes monocular depth estimation to represent scene geometry and enhances the objective by incorporating DINOv2 long-term semantics to regulate the optimization process. Our experiments demonstrate a substantial improvement in training speed (more than \textbf{10 times} faster), robustness, and accuracy in tracking over the SoTA optimization-based method OmniMotion.

* project page: https://timsong412.github.io/FastOmniTrack/

Via

Access Paper or Ask Questions

TRAM: Global Trajectory and Motion of 3D Humans from in-the-wild Videos

Mar 26, 2024

Yufu Wang, Ziyun Wang, Lingjie Liu, Kostas Daniilidis

Figure 1 for TRAM: Global Trajectory and Motion of 3D Humans from in-the-wild Videos

Figure 2 for TRAM: Global Trajectory and Motion of 3D Humans from in-the-wild Videos

Figure 3 for TRAM: Global Trajectory and Motion of 3D Humans from in-the-wild Videos

Figure 4 for TRAM: Global Trajectory and Motion of 3D Humans from in-the-wild Videos

Abstract:We propose TRAM, a two-stage method to reconstruct a human's global trajectory and motion from in-the-wild videos. TRAM robustifies SLAM to recover the camera motion in the presence of dynamic humans and uses the scene background to derive the motion scale. Using the recovered camera as a metric-scale reference frame, we introduce a video transformer model (VIMO) to regress the kinematic body motion of a human. By composing the two motions, we achieve accurate recovery of 3D humans in the world space, reducing global motion errors by 60% from prior work. https://yufu-wang.github.io/tram4d/

* The project website: https://yufu-wang.github.io/tram4d/

Via

Access Paper or Ask Questions

Beyond Uncertainty: Risk-Aware Active View Acquisition for Safe Robot Navigation and 3D Scene Understanding with FisherRF

Mar 18, 2024

Guangyi Liu, Wen Jiang, Boshu Lei, Vivek Pandey, Kostas Daniilidis, Nader Motee

Figure 1 for Beyond Uncertainty: Risk-Aware Active View Acquisition for Safe Robot Navigation and 3D Scene Understanding with FisherRF

Figure 2 for Beyond Uncertainty: Risk-Aware Active View Acquisition for Safe Robot Navigation and 3D Scene Understanding with FisherRF

Figure 3 for Beyond Uncertainty: Risk-Aware Active View Acquisition for Safe Robot Navigation and 3D Scene Understanding with FisherRF

Figure 4 for Beyond Uncertainty: Risk-Aware Active View Acquisition for Safe Robot Navigation and 3D Scene Understanding with FisherRF

Abstract:This work proposes a novel approach to bolster both the robot's risk assessment and safety measures while deepening its understanding of 3D scenes, which is achieved by leveraging Radiance Field (RF) models and 3D Gaussian Splatting. To further enhance these capabilities, we incorporate additional sampled views from the environment with the RF model. One of our key contributions is the introduction of Risk-aware Environment Masking (RaEM), which prioritizes crucial information by selecting the next-best-view that maximizes the expected information gain. This targeted approach aims to minimize uncertainties surrounding the robot's path and enhance the safety of its navigation. Our method offers a dual benefit: improved robot safety and increased efficiency in risk-aware 3D scene reconstruction and understanding. Extensive experiments in real-world scenarios demonstrate the effectiveness of our proposed approach, highlighting its potential to establish a robust and safety-focused framework for active robot exploration and 3D scene understanding.

Via

Access Paper or Ask Questions

Scalable Networked Feature Selection with Randomized Algorithm for Robot Navigation

Mar 18, 2024

Vivek Pandey, Arash Amini, Guangyi Liu, Ufuk Topcu, Qiyu Sun, Kostas Daniilidis, Nader Motee

Figure 1 for Scalable Networked Feature Selection with Randomized Algorithm for Robot Navigation

Figure 2 for Scalable Networked Feature Selection with Randomized Algorithm for Robot Navigation

Figure 3 for Scalable Networked Feature Selection with Randomized Algorithm for Robot Navigation

Figure 4 for Scalable Networked Feature Selection with Randomized Algorithm for Robot Navigation

Abstract:We address the problem of sparse selection of visual features for localizing a team of robots navigating an unknown environment, where robots can exchange relative position measurements with neighbors. We select a set of the most informative features by anticipating their importance in robots localization by simulating trajectories of robots over a prediction horizon. Through theoretical proofs, we establish a crucial connection between graph Laplacian and the importance of features. We show that strong network connectivity translates to uniformity in feature importance, which enables uniform random sampling of features and reduces the overall computational complexity. We leverage a scalable randomized algorithm for sparse sums of positive semidefinite matrices to efficiently select the set of the most informative features and significantly improve the probabilistic performance bounds. Finally, we support our findings with extensive simulations.

Via

Access Paper or Ask Questions

Event-based Continuous Color Video Decompression from Single Frames

Nov 30, 2023

Ziyun Wang, Friedhelm Hamann, Kenneth Chaney, Wen Jiang, Guillermo Gallego, Kostas Daniilidis

Figure 1 for Event-based Continuous Color Video Decompression from Single Frames

Figure 2 for Event-based Continuous Color Video Decompression from Single Frames

Figure 3 for Event-based Continuous Color Video Decompression from Single Frames

Figure 4 for Event-based Continuous Color Video Decompression from Single Frames

Abstract:We present ContinuityCam, a novel approach to generate a continuous video from a single static RGB image, using an event camera. Conventional cameras struggle with high-speed motion capture due to bandwidth and dynamic range limitations. Event cameras are ideal sensors to solve this problem because they encode compressed change information at high temporal resolution. In this work, we propose a novel task called event-based continuous color video decompression, pairing single static color frames and events to reconstruct temporally continuous videos. Our approach combines continuous long-range motion modeling with a feature-plane-based synthesis neural integration model, enabling frame prediction at arbitrary times within the events. Our method does not rely on additional frames except for the initial image, increasing, thus, the robustness to sudden light changes, minimizing the prediction latency, and decreasing the bandwidth requirement. We introduce a novel single objective beamsplitter setup that acquires aligned images and events and a novel and challenging Event Extreme Decompression Dataset (E2D2) that tests the method in various lighting and motion profiles. We thoroughly evaluate our method through benchmarking reconstruction as well as various downstream tasks. Our approach significantly outperforms the event- and image- based baselines in the proposed task.

Via

Access Paper or Ask Questions

DynMF: Neural Motion Factorization for Real-time Dynamic View Synthesis with 3D Gaussian Splatting

Nov 30, 2023

Agelos Kratimenos, Jiahui Lei, Kostas Daniilidis

Abstract:Accurately and efficiently modeling dynamic scenes and motions is considered so challenging a task due to temporal dynamics and motion complexity. To address these challenges, we propose DynMF, a compact and efficient representation that decomposes a dynamic scene into a few neural trajectories. We argue that the per-point motions of a dynamic scene can be decomposed into a small set of explicit or learned trajectories. Our carefully designed neural framework consisting of a tiny set of learned basis queried only in time allows for rendering speed similar to 3D Gaussian Splatting, surpassing 120 FPS, while at the same time, requiring only double the storage compared to static scenes. Our neural representation adequately constrains the inherently underconstrained motion field of a dynamic scene leading to effective and fast optimization. This is done by biding each point to motion coefficients that enforce the per-point sharing of basis trajectories. By carefully applying a sparsity loss to the motion coefficients, we are able to disentangle the motions that comprise the scene, independently control them, and generate novel motion combinations that have never been seen before. We can reach state-of-the-art render quality within just 5 minutes of training and in less than half an hour, we can synthesize novel views of dynamic scenes with superior photorealistic quality. Our representation is interpretable, efficient, and expressive enough to offer real-time view synthesis of complex dynamic scene motions, in monocular and multi-view scenarios.

* Project page: https://agelosk.github.io/dynmf/

Via

Access Paper or Ask Questions

Un-EvMoSeg: Unsupervised Event-based Independent Motion Segmentation

Nov 30, 2023

Ziyun Wang, Jinyuan Guo, Kostas Daniilidis

Figure 1 for Un-EvMoSeg: Unsupervised Event-based Independent Motion Segmentation

Figure 2 for Un-EvMoSeg: Unsupervised Event-based Independent Motion Segmentation

Figure 3 for Un-EvMoSeg: Unsupervised Event-based Independent Motion Segmentation

Figure 4 for Un-EvMoSeg: Unsupervised Event-based Independent Motion Segmentation

Abstract:Event cameras are a novel type of biologically inspired vision sensor known for their high temporal resolution, high dynamic range, and low power consumption. Because of these properties, they are well-suited for processing fast motions that require rapid reactions. Although event cameras have recently shown competitive performance in unsupervised optical flow estimation, performance in detecting independently moving objects (IMOs) is lacking behind, although event-based methods would be suited for this task based on their low latency and HDR properties. Previous approaches to event-based IMO segmentation have been heavily dependent on labeled data. However, biological vision systems have developed the ability to avoid moving objects through daily tasks without being given explicit labels. In this work, we propose the first event framework that generates IMO pseudo-labels using geometric constraints. Due to its unsupervised nature, our method can handle an arbitrary number of not predetermined objects and is easily scalable to datasets where expensive IMO labels are not readily available. We evaluate our approach on the EVIMO dataset and show that it performs competitively with supervised methods, both quantitatively and qualitatively.

Via

Access Paper or Ask Questions

FisherRF: Active View Selection and Uncertainty Quantification for Radiance Fields using Fisher Information

Nov 29, 2023

Wen Jiang, Boshu Lei, Kostas Daniilidis

Abstract:This study addresses the challenging problem of active view selection and uncertainty quantification within the domain of Radiance Fields. Neural Radiance Fields (NeRF) have greatly advanced image rendering and reconstruction, but the limited availability of 2D images poses uncertainties stemming from occlusions, depth ambiguities, and imaging errors. Efficiently selecting informative views becomes crucial, and quantifying NeRF model uncertainty presents intricate challenges. Existing approaches either depend on model architecture or are based on assumptions regarding density distributions that are not generally applicable. By leveraging Fisher Information, we efficiently quantify observed information within Radiance Fields without ground truth data. This can be used for the next best view selection and pixel-wise uncertainty quantification. Our method overcomes existing limitations on model architecture and effectiveness, achieving state-of-the-art results in both view selection and uncertainty quantification, demonstrating its potential to advance the field of Radiance Fields. Our method with the 3D Gaussian Splatting backend could perform view selections at 70 fps.

* Project page: https://jiangwenpl.github.io/FisherRF/

Via

Access Paper or Ask Questions

GART: Gaussian Articulated Template Models

Nov 27, 2023

Jiahui Lei, Yufu Wang, Georgios Pavlakos, Lingjie Liu, Kostas Daniilidis

Abstract:We introduce Gaussian Articulated Template Model GART, an explicit, efficient, and expressive representation for non-rigid articulated subject capturing and rendering from monocular videos. GART utilizes a mixture of moving 3D Gaussians to explicitly approximate a deformable subject's geometry and appearance. It takes advantage of a categorical template model prior (SMPL, SMAL, etc.) with learnable forward skinning while further generalizing to more complex non-rigid deformations with novel latent bones. GART can be reconstructed via differentiable rendering from monocular videos in seconds or minutes and rendered in novel poses faster than 150fps.

* 13 pages, code available at https://www.cis.upenn.edu/~leijh/projects/gart/

Via

Access Paper or Ask Questions

EvDNeRF: Reconstructing Event Data with Dynamic Neural Radiance Fields

Oct 03, 2023

Anish Bhattacharya, Ratnesh Madaan, Fernando Cladera, Sai Vemprala, Rogerio Bonatti, Kostas Daniilidis, Ashish Kapoor, Vijay Kumar, Nikolai Matni, Jayesh K. Gupta

Figure 1 for EvDNeRF: Reconstructing Event Data with Dynamic Neural Radiance Fields

Figure 2 for EvDNeRF: Reconstructing Event Data with Dynamic Neural Radiance Fields

Figure 3 for EvDNeRF: Reconstructing Event Data with Dynamic Neural Radiance Fields

Figure 4 for EvDNeRF: Reconstructing Event Data with Dynamic Neural Radiance Fields

Abstract:We present EvDNeRF, a pipeline for generating event data and training an event-based dynamic NeRF, for the purpose of faithfully reconstructing eventstreams on scenes with rigid and non-rigid deformations that may be too fast to capture with a standard camera. Event cameras register asynchronous per-pixel brightness changes at MHz rates with high dynamic range, making them ideal for observing fast motion with almost no motion blur. Neural radiance fields (NeRFs) offer visual-quality geometric-based learnable rendering, but prior work with events has only considered reconstruction of static scenes. Our EvDNeRF can predict eventstreams of dynamic scenes from a static or moving viewpoint between any desired timestamps, thereby allowing it to be used as an event-based simulator for a given scene. We show that by training on varied batch sizes of events, we can improve test-time predictions of events at fine time resolutions, outperforming baselines that pair standard dynamic NeRFs with event simulators. We release our simulated and real datasets, as well as code for both event-based data generation and the training of event-based dynamic NeRF models (https://github.com/anish-bhattacharya/EvDNeRF).

* 17 pages, 20 figures, 2 tables

Via

Access Paper or Ask Questions