Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Chinmay Talegaonkar

Leveraging 6DoF Pose Foundation Models For Mapping Marine Sediment Burial

Jun 12, 2025

Jerry Yan, Chinmay Talegaonkar, Nicholas Antipa, Eric Terrill, Sophia Merrifield

Abstract:The burial state of anthropogenic objects on the seafloor provides insight into localized sedimentation dynamics and is also critical for assessing ecological risks, potential pollutant transport, and the viability of recovery or mitigation strategies for hazardous materials such as munitions. Accurate burial depth estimation from remote imagery remains difficult due to partial occlusion, poor visibility, and object degradation. This work introduces a computer vision pipeline, called PoseIDON, which combines deep foundation model features with multiview photogrammetry to estimate six degrees of freedom object pose and the orientation of the surrounding seafloor from ROV video. Burial depth is inferred by aligning CAD models of the objects with observed imagery and fitting a local planar approximation of the seafloor. The method is validated using footage of 54 objects, including barrels and munitions, recorded at a historic ocean dumpsite in the San Pedro Basin. The model achieves a mean burial depth error of approximately 10 centimeters and resolves spatial burial patterns that reflect underlying sediment transport processes. This approach enables scalable, non-invasive mapping of seafloor burial and supports environmental assessment at contaminated sites.

Via

Access Paper or Ask Questions

Repurposing Marigold for Zero-Shot Metric Depth Estimation via Defocus Blur Cues

May 23, 2025

Chinmay Talegaonkar, Nikhil Gandudi Suresh, Zachary Novack, Yash Belhe, Priyanka Nagasamudra, Nicholas Antipa

Abstract:Recent monocular metric depth estimation (MMDE) methods have made notable progress towards zero-shot generalization. However, they still exhibit a significant performance drop on out-of-distribution datasets. We address this limitation by injecting defocus blur cues at inference time into Marigold, a \textit{pre-trained} diffusion model for zero-shot, scale-invariant monocular depth estimation (MDE). Our method effectively turns Marigold into a metric depth predictor in a training-free manner. To incorporate defocus cues, we capture two images with a small and a large aperture from the same viewpoint. To recover metric depth, we then optimize the metric depth scaling parameters and the noise latents of Marigold at inference time using gradients from a loss function based on the defocus-blur image formation model. We compare our method against existing state-of-the-art zero-shot MMDE methods on a self-collected real dataset, showing quantitative and qualitative improvements.

Via

Access Paper or Ask Questions

Volumetrically Consistent 3D Gaussian Rasterization

Dec 04, 2024

Chinmay Talegaonkar, Yash Belhe, Ravi Ramamoorthi, Nicholas Antipa

Figure 1 for Volumetrically Consistent 3D Gaussian Rasterization

Figure 2 for Volumetrically Consistent 3D Gaussian Rasterization

Figure 3 for Volumetrically Consistent 3D Gaussian Rasterization

Figure 4 for Volumetrically Consistent 3D Gaussian Rasterization

Abstract:Recently, 3D Gaussian Splatting (3DGS) has enabled photorealistic view synthesis at high inference speeds. However, its splatting-based rendering model makes several approximations to the rendering equation, reducing physical accuracy. We show that splatting and its approximations are unnecessary, even within a rasterizer; we instead volumetrically integrate 3D Gaussians directly to compute the transmittance across them analytically. We use this analytic transmittance to derive more physically-accurate alpha values than 3DGS, which can directly be used within their framework. The result is a method that more closely follows the volume rendering equation (similar to ray-tracing) while enjoying the speed benefits of rasterization. Our method represents opaque surfaces with higher accuracy and fewer points than 3DGS. This enables it to outperform 3DGS for view synthesis (measured in SSIM and LPIPS). Being volumetrically consistent also enables our method to work out of the box for tomography. We match the state-of-the-art 3DGS-based tomography method with fewer points. Being volumetrically consistent also enables our method to work out of the box for tomography. We match the state-of-the-art 3DGS-based tomography method with fewer points.

Via

Access Paper or Ask Questions

Pose Estimation of Buried Deep-Sea Objects using 3D Vision Deep Learning Models

Oct 01, 2024

Jerry Yan, Chinmay Talegaonkar, Nicholas Antipa, Eric Terrill, Sophia Merrifield

Figure 1 for Pose Estimation of Buried Deep-Sea Objects using 3D Vision Deep Learning Models

Figure 2 for Pose Estimation of Buried Deep-Sea Objects using 3D Vision Deep Learning Models

Figure 3 for Pose Estimation of Buried Deep-Sea Objects using 3D Vision Deep Learning Models

Abstract:We present an approach for pose and burial fraction estimation of debris field barrels found on the seabed in the Southern California San Pedro Basin. Our computational workflow leverages recent advances in foundation models for segmentation and a vision transformer-based approach to estimate the point cloud which defines the geometry of the barrel. We propose BarrelNet for estimating the 6-DOF pose and radius of buried barrels from the barrel point clouds as input. We train BarrelNet using synthetically generated barrel point clouds, and qualitatively demonstrate the potential of our approach using remotely operated vehicle (ROV) video footage of barrels found at a historic dump site. We compare our method to a traditional least squares fitting approach and show significant improvement according to our defined benchmarks.

* Submitted to OCEANS 2024 Halifax

Via

Access Paper or Ask Questions

Visual Physics: Discovering Physical Laws from Videos

Nov 27, 2019

Pradyumna Chari, Chinmay Talegaonkar, Yunhao Ba, Achuta Kadambi

Figure 1 for Visual Physics: Discovering Physical Laws from Videos

Figure 2 for Visual Physics: Discovering Physical Laws from Videos

Figure 3 for Visual Physics: Discovering Physical Laws from Videos

Figure 4 for Visual Physics: Discovering Physical Laws from Videos

Abstract:In this paper, we teach a machine to discover the laws of physics from video streams. We assume no prior knowledge of physics, beyond a temporal stream of bounding boxes. The problem is very difficult because a machine must learn not only a governing equation (e.g. projectile motion) but also the existence of governing parameters (e.g. velocities). We evaluate our ability to discover physical laws on videos of elementary physical phenomena, such as projectile motion or circular motion. These elementary tasks have textbook governing equations and enable ground truth verification of our approach.

Via

Access Paper or Ask Questions