Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Sanskar Agrawal

Rays as Pixels: Learning A Joint Distribution of Videos and Camera Trajectories

Apr 10, 2026

Wonbong Jang, Shikun Liu, Soubhik Sanyal, Juan Camilo Perez, Kam Woh Ng, Sanskar Agrawal, Juan-Manuel Perez-Rua, Yiannis Douratsos, Tao Xiang

Abstract:Recovering camera parameters from images and rendering scenes from novel viewpoints have long been treated as separate tasks in computer vision and graphics. This separation breaks down when image coverage is sparse or poses are ambiguous, since each task needs what the other produces. We propose Rays as Pixels, a Video Diffusion Model (VDM) that learns a joint distribution over videos and camera trajectories. We represent each camera as dense ray pixels (raxels) and denoise them jointly with video frames through Decoupled Self-Cross Attention mechanism. A single trained model handles three tasks: predicting camera trajectories from video, jointly generating video and camera trajectory from input images, and generating video from input images along a target camera trajectory. Because the model can both predict trajectories from a video and generate views conditioned on its own predictions, we evaluate it through a closed-loop self-consistency test, demonstrating that its forward and inverse predictions agree. Notably, trajectory prediction requires far fewer denoising steps than video generation, even a few denoising steps suffice for self-consistency. We report results on pose estimation and camera-controlled video generation.

* 9 pages, 6 figures, 4 tables. Project page: https://wbjang.github.io/raysaspixels/

Via

Access Paper or Ask Questions

Objects With Lighting: A Real-World Dataset for Evaluating Reconstruction and Rendering for Object Relighting

Jan 17, 2024

Benjamin Ummenhofer, Sanskar Agrawal, Rene Sepulveda, Yixing Lao, Kai Zhang, Tianhang Cheng, Stephan Richter, Shenlong Wang, German Ros

Figure 1 for Objects With Lighting: A Real-World Dataset for Evaluating Reconstruction and Rendering for Object Relighting

Figure 2 for Objects With Lighting: A Real-World Dataset for Evaluating Reconstruction and Rendering for Object Relighting

Figure 3 for Objects With Lighting: A Real-World Dataset for Evaluating Reconstruction and Rendering for Object Relighting

Figure 4 for Objects With Lighting: A Real-World Dataset for Evaluating Reconstruction and Rendering for Object Relighting

Abstract:Reconstructing an object from photos and placing it virtually in a new environment goes beyond the standard novel view synthesis task as the appearance of the object has to not only adapt to the novel viewpoint but also to the new lighting conditions and yet evaluations of inverse rendering methods rely on novel view synthesis data or simplistic synthetic datasets for quantitative analysis. This work presents a real-world dataset for measuring the reconstruction and rendering of objects for relighting. To this end, we capture the environment lighting and ground truth images of the same objects in multiple environments allowing to reconstruct the objects from images taken in one environment and quantify the quality of the rendered views for the unseen lighting environments. Further, we introduce a simple baseline composed of off-the-shelf methods and test several state-of-the-art methods on the relighting task and show that novel view synthesis is not a reliable proxy to measure performance. Code and dataset are available at https://github.com/isl-org/objects-with-lighting .

* Accepted at 3DV 2024, Oral presentation. For the project page see https://github.com/isl-org/objects-with-lighting

Via

Access Paper or Ask Questions