Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Angjoo Kanazawa

Decoupling Human and Camera Motion from Videos in the Wild

Mar 20, 2023
Vickie Ye, Georgios Pavlakos, Jitendra Malik, Angjoo Kanazawa

Figure 1 for Decoupling Human and Camera Motion from Videos in the Wild

Figure 2 for Decoupling Human and Camera Motion from Videos in the Wild

Figure 3 for Decoupling Human and Camera Motion from Videos in the Wild

Figure 4 for Decoupling Human and Camera Motion from Videos in the Wild

We propose a method to reconstruct global human trajectories from videos in the wild. Our optimization method decouples the camera and human motion, which allows us to place people in the same world coordinate frame. Most existing methods do not model the camera motion; methods that rely on the background pixels to infer 3D human motion usually require a full scene reconstruction, which is often not possible for in-the-wild videos. However, even when existing SLAM systems cannot recover accurate scene reconstructions, the background pixel motion still provides enough signal to constrain the camera motion. We show that relative camera estimates along with data-driven human motion priors can resolve the scene scale ambiguity and recover global human trajectories. Our method robustly recovers the global 3D trajectories of people in challenging in-the-wild videos, such as PoseTrack. We quantify our improvement over existing methods on 3D human dataset Egobody. We further demonstrate that our recovered camera scale allows us to reason about motion of multiple people in a shared coordinate frame, which improves performance of downstream tracking in PoseTrack. Code and video results can be found at https://vye16.github.io/slahmr.

* Project site: https://vye16.github.io/slahmr. CVPR 2023

Via

Access Paper or Ask Questions

LERF: Language Embedded Radiance Fields

Mar 16, 2023
Justin Kerr, Chung Min Kim, Ken Goldberg, Angjoo Kanazawa, Matthew Tancik

Figure 1 for LERF: Language Embedded Radiance Fields

Figure 2 for LERF: Language Embedded Radiance Fields

Figure 3 for LERF: Language Embedded Radiance Fields

Figure 4 for LERF: Language Embedded Radiance Fields

Humans describe the physical world using natural language to refer to specific 3D locations based on a vast range of properties: visual appearance, semantics, abstract associations, or actionable affordances. In this work we propose Language Embedded Radiance Fields (LERFs), a method for grounding language embeddings from off-the-shelf models like CLIP into NeRF, which enable these types of open-ended language queries in 3D. LERF learns a dense, multi-scale language field inside NeRF by volume rendering CLIP embeddings along training rays, supervising these embeddings across training views to provide multi-view consistency and smooth the underlying language field. After optimization, LERF can extract 3D relevancy maps for a broad range of language prompts interactively in real-time, which has potential use cases in robotics, understanding vision-language models, and interacting with 3D scenes. LERF enables pixel-aligned, zero-shot queries on the distilled 3D CLIP embeddings without relying on region proposals or masks, supporting long-tail open-vocabulary queries hierarchically across the volume. The project website can be found at https://lerf.io .

* Project website can be found at https://lerf.io

Via

Access Paper or Ask Questions

Nerfstudio: A Modular Framework for Neural Radiance Field Development

Feb 08, 2023
Matthew Tancik, Ethan Weber, Evonne Ng, Ruilong Li, Brent Yi, Justin Kerr, Terrance Wang, Alexander Kristoffersen, Jake Austin, Kamyar Salahi, Abhik Ahuja, David McAllister, Angjoo Kanazawa

Figure 1 for Nerfstudio: A Modular Framework for Neural Radiance Field Development

Figure 2 for Nerfstudio: A Modular Framework for Neural Radiance Field Development

Figure 3 for Nerfstudio: A Modular Framework for Neural Radiance Field Development

Figure 4 for Nerfstudio: A Modular Framework for Neural Radiance Field Development

Neural Radiance Fields (NeRF) are a rapidly growing area of research with wide-ranging applications in computer vision, graphics, robotics, and more. In order to streamline the development and deployment of NeRF research, we propose a modular PyTorch framework, Nerfstudio. Our framework includes plug-and-play components for implementing NeRF-based methods, which make it easy for researchers and practitioners to incorporate NeRF into their projects. Additionally, the modular design enables support for extensive real-time visualization tools, streamlined pipelines for importing captured in-the-wild data, and tools for exporting to video, point cloud and mesh representations. The modularity of Nerfstudio enables the development of Nerfacto, our method that combines components from recent papers to achieve a balance between speed and quality, while also remaining flexible to future modifications. To promote community-driven development, all associated code and data are made publicly available with open-source licensing at https://nerf.studio.

* Project page at https://nerf.studio

Via

Access Paper or Ask Questions

K-Planes: Explicit Radiance Fields in Space, Time, and Appearance

Jan 24, 2023
Sara Fridovich-Keil, Giacomo Meanti, Frederik Warburg, Benjamin Recht, Angjoo Kanazawa

Figure 1 for K-Planes: Explicit Radiance Fields in Space, Time, and Appearance

Figure 2 for K-Planes: Explicit Radiance Fields in Space, Time, and Appearance

Figure 3 for K-Planes: Explicit Radiance Fields in Space, Time, and Appearance

Figure 4 for K-Planes: Explicit Radiance Fields in Space, Time, and Appearance

We introduce k-planes, a white-box model for radiance fields in arbitrary dimensions. Our model uses d choose 2 planes to represent a d-dimensional scene, providing a seamless way to go from static (d=3) to dynamic (d=4) scenes. This planar factorization makes adding dimension-specific priors easy, e.g. temporal smoothness and multi-resolution spatial structure, and induces a natural decomposition of static and dynamic components of a scene. We use a linear feature decoder with a learned color basis that yields similar performance as a nonlinear black-box MLP decoder. Across a range of synthetic and real, static and dynamic, fixed and varying appearance scenes, k-planes yields competitive and often state-of-the-art reconstruction fidelity with low memory usage, achieving 1000x compression over a full 4D grid, and fast optimization with a pure PyTorch implementation. For video results and code, please see sarafridov.github.io/K-Planes.

* Project page https://sarafridov.github.io/K-Planes/

Via

Access Paper or Ask Questions

Monocular Dynamic View Synthesis: A Reality Check

Oct 24, 2022
Hang Gao, Ruilong Li, Shubham Tulsiani, Bryan Russell, Angjoo Kanazawa

Figure 1 for Monocular Dynamic View Synthesis: A Reality Check

Figure 2 for Monocular Dynamic View Synthesis: A Reality Check

Figure 3 for Monocular Dynamic View Synthesis: A Reality Check

Figure 4 for Monocular Dynamic View Synthesis: A Reality Check

We study the recent progress on dynamic view synthesis (DVS) from monocular video. Though existing approaches have demonstrated impressive results, we show a discrepancy between the practical capture process and the existing experimental protocols, which effectively leaks in multi-view signals during training. We define effective multi-view factors (EMFs) to quantify the amount of multi-view signal present in the input capture sequence based on the relative camera-scene motion. We introduce two new metrics: co-visibility masked image metrics and correspondence accuracy, which overcome the issue in existing protocols. We also propose a new iPhone dataset that includes more diverse real-life deformation sequences. Using our proposed experimental protocol, we show that the state-of-the-art approaches observe a 1-2 dB drop in masked PSNR in the absence of multi-view cues and 4-5 dB drop when modeling complex motion. Code and data can be found at https://hangg7.com/dycheck.

* NeurIPS 2022. Project page: https://hangg7.com/dycheck. Code: https://github.com/KAIR-BAIR/dycheck

Via

Access Paper or Ask Questions

NerfAcc: A General NeRF Acceleration Toolbox

Oct 10, 2022
Ruilong Li, Matthew Tancik, Angjoo Kanazawa

Figure 1 for NerfAcc: A General NeRF Acceleration Toolbox

Figure 2 for NerfAcc: A General NeRF Acceleration Toolbox

Figure 3 for NerfAcc: A General NeRF Acceleration Toolbox

Figure 4 for NerfAcc: A General NeRF Acceleration Toolbox

We propose NerfAcc, a toolbox for efficient volumetric rendering of radiance fields. We build on the techniques proposed in Instant-NGP, and extend these techniques to not only support bounded static scenes, but also for dynamic scenes and unbounded scenes. NerfAcc comes with a user-friendly Python API, and is ready for plug-and-play acceleration of most NeRFs. Various examples are provided to show how to use this toolbox. Code can be found here: https://github.com/KAIR-BAIR/nerfacc.

* Webpage: https://www.nerfacc.com/

Via

Access Paper or Ask Questions

Studying Bias in GANs through the Lens of Race

Sep 15, 2022
Vongani H. Maluleke, Neerja Thakkar, Tim Brooks, Ethan Weber, Trevor Darrell, Alexei A. Efros, Angjoo Kanazawa, Devin Guillory

Figure 1 for Studying Bias in GANs through the Lens of Race

Figure 2 for Studying Bias in GANs through the Lens of Race

Figure 3 for Studying Bias in GANs through the Lens of Race

Figure 4 for Studying Bias in GANs through the Lens of Race

In this work, we study how the performance and evaluation of generative image models are impacted by the racial composition of their training datasets. By examining and controlling the racial distributions in various training datasets, we are able to observe the impacts of different training distributions on generated image quality and the racial distributions of the generated images. Our results show that the racial compositions of generated images successfully preserve that of the training data. However, we observe that truncation, a technique used to generate higher quality images during inference, exacerbates racial imbalances in the data. Lastly, when examining the relationship between image quality and race, we find that the highest perceived visual quality images of a given race come from a distribution where that race is well-represented, and that annotators consistently prefer generated images of white people over those of Black people.

* ECCV 2022. Project Page: https://neerja.me/bias-gans/

Via

Access Paper or Ask Questions

The One Where They Reconstructed 3D Humans and Environments in TV Shows

Jul 28, 2022
Georgios Pavlakos, Ethan Weber, Matthew Tancik, Angjoo Kanazawa

Figure 1 for The One Where They Reconstructed 3D Humans and Environments in TV Shows

Figure 2 for The One Where They Reconstructed 3D Humans and Environments in TV Shows

Figure 3 for The One Where They Reconstructed 3D Humans and Environments in TV Shows

Figure 4 for The One Where They Reconstructed 3D Humans and Environments in TV Shows

TV shows depict a wide variety of human behaviors and have been studied extensively for their potential to be a rich source of data for many applications. However, the majority of the existing work focuses on 2D recognition tasks. In this paper, we make the observation that there is a certain persistence in TV shows, i.e., repetition of the environments and the humans, which makes possible the 3D reconstruction of this content. Building on this insight, we propose an automatic approach that operates on an entire season of a TV show and aggregates information in 3D; we build a 3D model of the environment, compute camera information, static 3D scene structure and body scale information. Then, we demonstrate how this information acts as rich 3D context that can guide and improve the recovery of 3D human pose and position in these environments. Moreover, we show that reasoning about humans and their environment in 3D enables a broad range of downstream applications: re-identification, gaze estimation, cinematography and image editing. We apply our approach on environments from seven iconic TV shows and perform an extensive evaluation of the proposed system.

* ECCV 2022. Project page: http://ethanweber.me/sitcoms3D/

Via

Access Paper or Ask Questions