IAC3 & Departament de Ciències Matemàtiques i Informàtica, Universitat de les Illes Balears
Abstract:Being one of the oldest and most basic problems in image processing, image denoising has seen a resurgence spurred by rapid advances in deep learning. Yet, most modern denoising architectures make limited use of the technical knowledge acquired researching the classical denoisers that came before the mainstream use of neural networks, instead relying on depth and large parameter counts. This poses a challenge not only for understanding the properties of such networks, but also for deploying them on real devices which may present resource constraints and diverse noise profiles. Tackling both issues, we propose an architecture dedicated to RAW-to-RAW denoising that incorporates the interpretable structure of classical self-similarity-based denoisers into a fully learnable neural network. Our design centers on a novel nonlocal block that parallels the established pipeline of neighbor matching, collaborative filtering and aggregation popularized by nonlocal patch-based methods, operating on learned multiscale feature representations. This built-in nonlocality efficiently expands the receptive field, sufficing a single block per scale with a moderate number of neighbors to obtain high-quality results. Training the network on a curated dataset with clean real RAW data and modeled synthetic noise while conditioning it on a noise level map yields a sensor-agnostic denoiser that generalizes effectively to unseen devices. Both quantitative and visual results on benchmarks and in-the-wild photographs position our method as a practical and interpretable solution for real-world RAW denoising, achieving results competitive with state-of-the-art convolutional and transformer-based denoisers while using significantly fewer parameters. The code is available at https://github.com/MIA-UIB/nonlocal-matchfilter .




Abstract:Denoising is one of the fundamental steps of the processing pipeline that converts data captured by a camera sensor into a display-ready image or video. It is generally performed early in the pipeline, usually before demosaicking, although studies swapping their order or even conducting them jointly have been proposed. With the advent of deep learning, the quality of denoising algorithms has steadily increased. Even so, modern neural networks still have a hard time adapting to new noise levels and scenes, which is indispensable for real-world applications. With those in mind, we propose a self-similarity-based denoising scheme that weights both a pre- and a post-demosaicking denoiser for Bayer-patterned CFA video data. We show that a balance between the two leads to better image quality, and we empirically find that higher noise levels benefit from a higher influence pre-demosaicking. We also integrate temporal trajectory prefiltering steps before each denoiser, which further improve texture reconstruction. The proposed method only requires an estimation of the noise model at the sensor, accurately adapts to any noise level, and is competitive with the state of the art, making it suitable for real-world videography.