Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

David R Bull

Splatography: Sparse multi-view dynamic Gaussian Splatting for filmmaking challenges

Nov 07, 2025

Adrian Azzarelli, Nantheera Anantrasirichai, David R Bull

Figure 1 for Splatography: Sparse multi-view dynamic Gaussian Splatting for filmmaking challenges

Figure 2 for Splatography: Sparse multi-view dynamic Gaussian Splatting for filmmaking challenges

Figure 3 for Splatography: Sparse multi-view dynamic Gaussian Splatting for filmmaking challenges

Figure 4 for Splatography: Sparse multi-view dynamic Gaussian Splatting for filmmaking challenges

Abstract:Deformable Gaussian Splatting (GS) accomplishes photorealistic dynamic 3-D reconstruction from dense multi-view video (MVV) by learning to deform a canonical GS representation. However, in filmmaking, tight budgets can result in sparse camera configurations, which limits state-of-the-art (SotA) methods when capturing complex dynamic features. To address this issue, we introduce an approach that splits the canonical Gaussians and deformation field into foreground and background components using a sparse set of masks for frames at t=0. Each representation is separately trained on different loss functions during canonical pre-training. Then, during dynamic training, different parameters are modeled for each deformation field following common filmmaking practices. The foreground stage contains diverse dynamic features so changes in color, position and rotation are learned. While, the background containing film-crew and equipment, is typically dimmer and less dynamic so only changes in point position are learned. Experiments on 3-D and 2.5-D entertainment datasets show that our method produces SotA qualitative and quantitative results; up to 3 PSNR higher with half the model size on 3-D scenes. Unlike the SotA and without the need for dense mask supervision, our method also produces segmented dynamic reconstructions including transparent and dynamic textures. Code and video comparisons are available online: https://interims-git.github.io/

Via

Access Paper or Ask Questions

Exploring Dynamic Novel View Synthesis Technologies for Cinematography

Dec 23, 2024

Adrian Azzarelli, Nantheera Anantrasirichai, David R Bull

Abstract:Novel view synthesis (NVS) has shown significant promise for applications in cinematographic production, particularly through the exploitation of Neural Radiance Fields (NeRF) and Gaussian Splatting (GS). These methods model real 3D scenes, enabling the creation of new shots that are challenging to capture in the real world due to set topology or expensive equipment requirement. This innovation also offers cinematographic advantages such as smooth camera movements, virtual re-shoots, slow-motion effects, etc. This paper explores dynamic NVS with the aim of facilitating the model selection process. We showcase its potential through a short montage filmed using various NVS models.

Via

Access Paper or Ask Questions

BVI-RLV: A Fully Registered Dataset and Benchmarks for Low-Light Video Enhancement

Jul 03, 2024

Ruirui Lin, Nantheera Anantrasirichai, Guoxi Huang, Joanne Lin, Qi Sun, Alexandra Malyugina, David R Bull

Figure 1 for BVI-RLV: A Fully Registered Dataset and Benchmarks for Low-Light Video Enhancement

Figure 2 for BVI-RLV: A Fully Registered Dataset and Benchmarks for Low-Light Video Enhancement

Figure 3 for BVI-RLV: A Fully Registered Dataset and Benchmarks for Low-Light Video Enhancement

Figure 4 for BVI-RLV: A Fully Registered Dataset and Benchmarks for Low-Light Video Enhancement

Abstract:Low-light videos often exhibit spatiotemporal incoherent noise, compromising visibility and performance in computer vision applications. One significant challenge in enhancing such content using deep learning is the scarcity of training data. This paper introduces a novel low-light video dataset, consisting of 40 scenes with various motion scenarios under two distinct low-lighting conditions, incorporating genuine noise and temporal artifacts. We provide fully registered ground truth data captured in normal light using a programmable motorized dolly and refine it via an image-based approach for pixel-wise frame alignment across different light levels. We provide benchmarks based on four different technologies: convolutional neural networks, transformers, diffusion models, and state space models (mamba). Our experimental results demonstrate the significance of fully registered video pairs for low-light video enhancement (LLVE) and the comprehensive evaluation shows that the models trained with our dataset outperform those trained with the existing datasets. Our dataset and links to benchmarks are publicly available at https://doi.org/10.21227/mzny-8c77.

* arXiv admin note: substantial text overlap with arXiv:2402.01970

Via

Access Paper or Ask Questions

Reviewing Intelligent Cinematography: AI research for camera-based video production

May 08, 2024

Adrian Azzarelli, Nantheera Anantrasirichai, David R Bull

Figure 1 for Reviewing Intelligent Cinematography: AI research for camera-based video production

Figure 2 for Reviewing Intelligent Cinematography: AI research for camera-based video production

Figure 3 for Reviewing Intelligent Cinematography: AI research for camera-based video production

Figure 4 for Reviewing Intelligent Cinematography: AI research for camera-based video production

Abstract:This paper offers a comprehensive review of artificial intelligence (AI) research in the context of real camera content acquisition for entertainment purposes and is aimed at both researchers and cinematographers. Considering the breadth of computer vision research and the lack of review papers tied to intelligent cinematography (IC), this review introduces a holistic view of the IC landscape while providing the technical insight for experts across across disciplines. We preface the main discussion with technical background on generative AI, object detection, automated camera calibration and 3-D content acquisition, and link explanatory articles to assist non-technical readers. The main discussion categorizes work by four production types: General Production, Virtual Production, Live Production and Aerial Production. Note that for Virtual Production we do not discuss research relating to virtual content acquisition, including work on automated video generation, like Stable Diffusion. Within each section, we (1) sub-classify work by the technical field of research - reflected by the subsections, and (2) evaluate the trends and challenge w.r.t to each type of production. In the final chapter, we present our concluding remarks on the greater scope of IC research and outline work that we believe has significant potential to influence the whole industry. We find that work relating to virtual production has the greatest potential to impact other mediums of production, driven by the growing interest in LED volumes/stages for in-camera virtual effects (ICVFX) and automated 3-D capture for a virtual modelling of real world scenes and actors. This is the first piece of literature to offer a structured and comprehensive examination of IC research. Consequently, we address ethical and legal concerns regarding the use of creative AI involving artists, actors and the general public, in the...

* For researchers and cinematographers. 43 pages including Table of Contents, List of Figures and Tables. We obtained permission to use Figures 5 and 11. All other Figures have been drawn by us

Via

Access Paper or Ask Questions

WavePlanes: A compact Wavelet representation for Dynamic Neural Radiance Fields

Dec 03, 2023

Adrian Azzarelli, Nantheera Anantrasirichai, David R Bull

Figure 1 for WavePlanes: A compact Wavelet representation for Dynamic Neural Radiance Fields

Figure 2 for WavePlanes: A compact Wavelet representation for Dynamic Neural Radiance Fields

Figure 3 for WavePlanes: A compact Wavelet representation for Dynamic Neural Radiance Fields

Figure 4 for WavePlanes: A compact Wavelet representation for Dynamic Neural Radiance Fields

Abstract:Dynamic Neural Radiance Fields (Dynamic NeRF) enhance NeRF technology to model moving scenes. However, they are resource intensive and challenging to compress. To address this issue, this paper presents WavePlanes, a fast and more compact explicit model. We propose a multi-scale space and space-time feature plane representation using N-level 2-D wavelet coefficients. The inverse discrete wavelet transform reconstructs N feature signals at varying detail, which are linearly decoded to approximate the color and density of volumes in a 4-D grid. Exploiting the sparsity of wavelet coefficients, we compress a Hash Map containing only non-zero coefficients and their locations on each plane. This results in a compressed model size of ~12 MB. Compared with state-of-the-art plane-based models, WavePlanes is up to 15x smaller, less computationally demanding and achieves comparable results in as little as one hour of training - without requiring custom CUDA code or high performance computing resources. Additionally, we propose new feature fusion schemes that work as well as previously proposed schemes while providing greater interpretability. Our code is available at: https://github.com/azzarelli/waveplanes/

Via

Access Paper or Ask Questions

Towards a Robust Framework for NeRF Evaluation

May 31, 2023

Adrian Azzarelli, Nantheera Anantrasirichai, David R Bull

Figure 1 for Towards a Robust Framework for NeRF Evaluation

Figure 2 for Towards a Robust Framework for NeRF Evaluation

Figure 3 for Towards a Robust Framework for NeRF Evaluation

Figure 4 for Towards a Robust Framework for NeRF Evaluation

Abstract:Neural Radiance Field (NeRF) research has attracted significant attention recently, with 3D modelling, virtual/augmented reality, and visual effects driving its application. While current NeRF implementations can produce high quality visual results, there is a conspicuous lack of reliable methods for evaluating them. Conventional image quality assessment methods and analytical metrics (e.g. PSNR, SSIM, LPIPS etc.) only provide approximate indicators of performance since they generalise the ability of the entire NeRF pipeline. Hence, in this paper, we propose a new test framework which isolates the neural rendering network from the NeRF pipeline and then performs a parametric evaluation by training and evaluating the NeRF on an explicit radiance field representation. We also introduce a configurable approach for generating representations specifically for evaluation purposes. This employs ray-casting to transform mesh models into explicit NeRF samples, as well as to "shade" these representations. Combining these two approaches, we demonstrate how different "tasks" (scenes with different visual effects or learning strategies) and types of networks (NeRFs and depth-wise implicit neural representations (INRs)) can be evaluated within this framework. Additionally, we propose a novel metric to measure task complexity of the framework which accounts for the visual parameters and the distribution of the spatial data. Our approach offers the potential to create a comparative objective evaluation framework for NeRF methods.

* 9 pages, 2 main experiments, 2 additional experiments

Via

Access Paper or Ask Questions