Defocus estimation is the process of estimating the amount of defocus blur in images to improve their sharpness.
Following the earlier verification for Gaussian model in \cite{ASaa2026}, this paper introduces a zero training forward computational framework for the model to realize it in real time applications. The framework is based on discrete calculation of the analytic expression of the defocused image from the sharper one for the application range of the standard deviation of the Gaussian kernels and selecting the best matches. The analytic expression yields multiple solutions at certain image points, but is filtered down to a single solution using similarity measures over neighboring points.The framework is structured to handle cases where two given images are partial blurred versions of each other. Experimental evaluations on real images demonstrate that the proposed framework achieves a mean absolute error (MAE) below $1.7\%$ in estimating synthetic blur values. Furthermore, the discrepancy between actual blurred image intensities and their corresponding estimates remains under $2\%$, obtained by applying the extracted defocus filters to less blurred images.
Autonomous aerial navigation in absolute darkness is crucial for post-disaster search and rescue operations, which often occur from disaster-zone power outages. Yet, due to resource constraints, tiny aerial robots, perfectly suited for these operations, are unable to navigate in the darkness to find survivors safely. In this paper, we present an autonomous aerial robot for navigation in the dark by combining an Infra-Red (IR) monocular camera with a large-aperture coded lens and structured light without external infrastructure like GPS or motion-capture. Our approach obtains depth-dependent defocus cues (each structured light point appears as a pattern that is depth dependent), which acts as a strong prior for our AsterNet deep depth estimation model. The model is trained in simulation by generating data using a simple optical model and transfers directly to the real world without any fine-tuning or retraining. AsterNet runs onboard the robot at 20 Hz on an NVIDIA Jetson Orin$^\text{TM}$ Nano. Furthermore, our network is robust to changes in the structured light pattern and relative placement of the pattern emitter and IR camera, leading to simplified and cost-effective construction. We successfully evaluate and demonstrate our proposed depth navigation approach AsterNav using depth from AsterNet in many real-world experiments using only onboard sensing and computation, including dark matte obstacles and thin ropes (diameter 6.25mm), achieving an overall success rate of 95.5% with unknown object shapes, locations and materials. To the best of our knowledge, this is the first work on monocular, structured-light-based quadrotor navigation in absolute darkness.
Event vision sensors (neuromorphic cameras) output sparse, asynchronous ON/OFF events triggered by log-intensity threshold crossings, enabling microsecond-scale sensing with high dynamic range and low data bandwidth. As a nonlinear system, this event representation does not readily integrate with the linear forward models that underpin most computational imaging and optical system design. We present a physics-grounded processing pipeline that maps event streams to estimates of per-pixel log-intensity and intensity derivatives, and embeds these measurements in a dynamic linear systems model with a time-varying point spread function. This enables inverse filtering directly from event data, using frequency-domain Wiener deconvolution with a known (or parameterised) dynamic transfer function. We validate the approach in simulation for single and overlapping point sources under modulated defocus, and on real event data from a tunable-focus telescope imaging a star field, demonstrating source localisation and separability. The proposed framework provides a practical bridge between event sensing and model-based computational imaging for dynamic optical systems.
This paper presents a method for carrying fair comparisons of the accuracy of pose estimation using fiducial markers. These comparisons rely on large sets of high-fidelity synthetic images enabling deep exploration of the 6 degrees of freedom. A low-discrepancy sampling of the space allows to check the correlations between each degree of freedom and the pose errors by plotting the 36 pairs of combinations. The images are rendered using a physically based ray tracing code that has been specifically developed to use the standard calibration coefficients of any camera directly. The software reproduces image distortions, defocus and diffraction blur. Furthermore, sub-pixel sampling is applied to sharp edges to enhance the fidelity of the rendered image. After introducing the rendering algorithm and its experimental validation, the paper proposes a method for evaluating the pose accuracy. This method is applied to well-known markers, revealing their strengths and weaknesses for pose estimation. The code is open source and available on GitHub.
Over the past three decades, defocus has consistently provided groundbreaking depth information in scene images. However, accurately estimating depth from 2D images continues to be a persistent and fundamental challenge in the field of 3D recovery. Heuristic approaches involve with the ill-posed problem for inferring the spatial variant defocusing blur, as the desired blur cannot be distinguished from the inherent blur. Given a prior knowledge of the defocus model, the problem become well-posed with an analytic solution for the relative blur between two images, taken at the same viewpoint with different camera settings for the focus. The Gaussian model stands out as an optimal choice for real-time applications, due to its mathematical simplicity and computational efficiency. And theoretically, it is the only model can be applied at the same time to both the absolute blur caused by depth in a single image and the relative blur resulting from depth differences between two images. This paper introduces the settings, for conventional imaging devices, to ensure that the defocusing operator adheres to the Gaussian model. Defocus analysis begins within the framework of geometric optics and is conducted by defocus aberration theory in diffraction-limited optics to obtain the accuracy of fitting the actual model to its Gaussian approximation. The results for a typical set of focused depths between $1$ and $100$ meters, with a maximum depth variation of $10\%$ at the focused depth, confirm the Gaussian model's applicability for defocus operators in most imaging devices. The findings demonstrate a maximum Mean Absolute Error $(\!M\!A\!E)$ of less than $1\%$, underscoring the model's accuracy and reliability.
Bokeh and monocular depth estimation are tightly coupled through the same lens imaging geometry, yet current methods exploit this connection in incomplete ways. High-quality bokeh rendering pipelines typically depend on noisy depth maps, which amplify estimation errors into visible artifacts, while modern monocular metric depth models still struggle on weakly textured, distant and geometrically ambiguous regions where defocus cues are most informative. We introduce BokehDepth, a two-stage framework that decouples bokeh synthesis from depth prediction and treats defocus as an auxiliary supervision-free geometric cue. In Stage-1, a physically guided controllable bokeh generator, built on a powerful pretrained image editing backbone, produces depth-free bokeh stacks with calibrated bokeh strength from a single sharp input. In Stage-2, a lightweight defocus-aware aggregation module plugs into existing monocular depth encoders, fuses features along the defocus dimension, and exposes stable depth-sensitive variations while leaving downstream decoder unchanged. Across challenging benchmarks, BokehDepth improves visual fidelity over depth-map-based bokeh baselines and consistently boosts the metric accuracy and robustness of strong monocular depth foundation models.




Three-dimensional reconstruction in scenes with extreme depth variations remains challenging due to inconsistent supervisory signals between near-field and far-field regions. Existing methods fail to simultaneously address inaccurate depth estimation in distant areas and structural degradation in close-range regions. This paper proposes a novel computational framework that integrates depth-of-field supervision and multi-view consistency supervision to advance 3D Gaussian Splatting. Our approach comprises two core components: (1) Depth-of-field Supervision employs a scale-recovered monocular depth estimator (e.g., Metric3D) to generate depth priors, leverages defocus convolution to synthesize physically accurate defocused images, and enforces geometric consistency through a novel depth-of-field loss, thereby enhancing depth fidelity in both far-field and near-field regions; (2) Multi-View Consistency Supervision employing LoFTR-based semi-dense feature matching to minimize cross-view geometric errors and enforce depth consistency via least squares optimization of reliable matched points. By unifying defocus physics with multi-view geometric constraints, our method achieves superior depth fidelity, demonstrating a 0.8 dB PSNR improvement over the state-of-the-art method on the Waymo Open Dataset. This framework bridges physical imaging principles and learning-based depth regularization, offering a scalable solution for complex depth stratification in urban environments.
Recent monocular metric depth estimation (MMDE) methods have made notable progress towards zero-shot generalization. However, they still exhibit a significant performance drop on out-of-distribution datasets. We address this limitation by injecting defocus blur cues at inference time into Marigold, a \textit{pre-trained} diffusion model for zero-shot, scale-invariant monocular depth estimation (MDE). Our method effectively turns Marigold into a metric depth predictor in a training-free manner. To incorporate defocus cues, we capture two images with a small and a large aperture from the same viewpoint. To recover metric depth, we then optimize the metric depth scaling parameters and the noise latents of Marigold at inference time using gradients from a loss function based on the defocus-blur image formation model. We compare our method against existing state-of-the-art zero-shot MMDE methods on a self-collected real dataset, showing quantitative and qualitative improvements.
In this paper, we utilize the dark channel as a complementary cue to estimate the depth of a scene from a single space-variant defocus blurred image due to its effectiveness in implicitly capturing the local statistics of blurred images and the scene structure. Existing depth-from-defocus (DFD) techniques typically rely on multiple images with varying apertures or focus settings to recover depth information. Very few attempts have focused on DFD from a single defocused image due to the underconstrained nature of the problem. Our method capitalizes on the relationship between local defocus blur and contrast variations as key depth cues to enhance the overall performance in estimating the scene's structure. The entire pipeline is trained adversarially in a fully end-to-end fashion. Experiments conducted on real data with realistic depth-induced defocus blur demonstrate that incorporating dark channel prior into single image DFD yields meaningful depth estimation results, validating the effectiveness of our approach.
This work addresses mechanical defocus in Earth observation images from the IMAGIN-e mission aboard the ISS, proposing a blind deblurring approach adapted to space-based edge computing constraints. Leveraging Sentinel-2 data, our method estimates the defocus kernel and trains a restoration model within a GAN framework, effectively operating without reference images. On Sentinel-2 images with synthetic degradation, SSIM improved by 72.47% and PSNR by 25.00%, confirming the model's ability to recover lost details when the original clean image is known. On IMAGIN-e, where no reference images exist, perceptual quality metrics indicate a substantial enhancement, with NIQE improving by 60.66% and BRISQUE by 48.38%, validating real-world onboard restoration. The approach is currently deployed aboard the IMAGIN-e mission, demonstrating its practical application in an operational space environment. By efficiently handling high-resolution images under edge computing constraints, the method enables applications such as water body segmentation and contour detection while maintaining processing viability despite resource limitations.