Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

A. Murat Tekalp

Content-Adaptive Inference for State-of-the-art Learned Video Compression

Oct 08, 2025

Ahmet Bilican, M. Akın Yılmaz, A. Murat Tekalp

Abstract:While the BD-rate performance of recent learned video codec models in both low-delay and random-access modes exceed that of respective modes of traditional codecs on average over common benchmarks, the performance improvements for individual videos with complex/large motions is much smaller compared to scenes with simple motion. This is related to the inability of a learned encoder model to generalize to motion vector ranges that have not been seen in the training set, which causes loss of performance in both coding of flow fields as well as frame prediction and coding. As a remedy, we propose a generic (model-agnostic) framework to control the scale of motion vectors in a scene during inference (encoding) to approximately match the range of motion vectors in the test and training videos by adaptively downsampling frames. This results in down-scaled motion vectors enabling: i) better flow estimation; hence, frame prediction and ii) more efficient flow compression. We show that the proposed framework for content-adaptive inference improves the BD-rate performance of already state-of-the-art low-delay video codec DCVC-FM by up to 41\% on individual videos without any model fine tuning. We present ablation studies to show measures of motion and scene complexity can be used to predict the effectiveness of the proposed framework.

Via

Access Paper or Ask Questions

DiMoSR: Feature Modulation via Multi-Branch Dilated Convolutions for Efficient Image Super-Resolution

May 27, 2025

M. Akin Yilmaz, Ahmet Bilican, A. Murat Tekalp

Abstract:Balancing reconstruction quality versus model efficiency remains a critical challenge in lightweight single image super-resolution (SISR). Despite the prevalence of attention mechanisms in recent state-of-the-art SISR approaches that primarily emphasize or suppress feature maps, alternative architectural paradigms warrant further exploration. This paper introduces DiMoSR (Dilated Modulation Super-Resolution), a novel architecture that enhances feature representation through modulation to complement attention in lightweight SISR networks. The proposed approach leverages multi-branch dilated convolutions to capture rich contextual information over a wider receptive field while maintaining computational efficiency. Experimental results demonstrate that DiMoSR outperforms state-of-the-art lightweight methods across diverse benchmark datasets, achieving superior PSNR and SSIM metrics with comparable or reduced computational complexity. Through comprehensive ablation studies, this work not only validates the effectiveness of DiMoSR but also provides critical insights into the interplay between attention mechanisms and feature modulation to guide future research in efficient network design. The code and model weights to reproduce our results are available at: https://github.com/makinyilmaz/DiMoSR

Via

Access Paper or Ask Questions

Exploring Sparsity for Parameter Efficient Fine Tuning Using Wavelets

May 18, 2025

Ahmet Bilican, M. Akın Yılmaz, A. Murat Tekalp, R. Gökberk Cinbiş

Abstract:Efficiently adapting large foundation models is critical, especially with tight compute and memory budgets. Parameter-Efficient Fine-Tuning (PEFT) methods such as LoRA offer limited granularity and effectiveness in few-parameter regimes. We propose Wavelet Fine-Tuning (WaveFT), a novel PEFT method that learns highly sparse updates in the wavelet domain of residual matrices. WaveFT allows precise control of trainable parameters, offering fine-grained capacity adjustment and excelling with remarkably low parameter count, potentially far fewer than LoRA's minimum -- ideal for extreme parameter-efficient scenarios. In order to demonstrate the effect of the wavelet transform, we compare WaveFT with a special case, called SHiRA, that entails applying sparse updates directly in the weight domain. Evaluated on personalized text-to-image generation using Stable Diffusion XL as baseline, WaveFT significantly outperforms LoRA and other PEFT methods, especially at low parameter counts; achieving superior subject fidelity, prompt alignment, and image diversity.

Via

Access Paper or Ask Questions

FG-DFPN: Flow Guided Deformable Frame Prediction Network

Mar 14, 2025

M. Akın Yılmaz, Ahmet Bilican, A. Murat Tekalp

Abstract:Video frame prediction remains a fundamental challenge in computer vision with direct implications for autonomous systems, video compression, and media synthesis. We present FG-DFPN, a novel architecture that harnesses the synergy between optical flow estimation and deformable convolutions to model complex spatio-temporal dynamics. By guiding deformable sampling with motion cues, our approach addresses the limitations of fixed-kernel networks when handling diverse motion patterns. The multi-scale design enables FG-DFPN to simultaneously capture global scene transformations and local object movements with remarkable precision. Our experiments demonstrate that FG-DFPN achieves state-of-the-art performance on eight diverse MPEG test sequences, outperforming existing methods by 1dB PSNR while maintaining competitive inference speeds. The integration of motion cues with adaptive geometric transformations makes FG-DFPN a promising solution for next-generation video processing systems that require high-fidelity temporal predictions. The model and instructions to reproduce our results will be released at: https://github.com/KUIS-AI-Tekalp-Research Group/frame-prediction

* Submitted to 33th European Signal Processing Conference (EUSIPCO) 2025

Via

Access Paper or Ask Questions

On the Computation of BD-Rate over a Set of Videos for Fair Assessment of Performance of Learned Video Codecs

Sep 13, 2024

M. Akin Yilmaz, Onur Keleş, A. Murat Tekalp

Figure 1 for On the Computation of BD-Rate over a Set of Videos for Fair Assessment of Performance of Learned Video Codecs

Figure 2 for On the Computation of BD-Rate over a Set of Videos for Fair Assessment of Performance of Learned Video Codecs

Figure 3 for On the Computation of BD-Rate over a Set of Videos for Fair Assessment of Performance of Learned Video Codecs

Figure 4 for On the Computation of BD-Rate over a Set of Videos for Fair Assessment of Performance of Learned Video Codecs

Abstract:The Bj{\o}ntegaard Delta (BD) measure is widely employed to evaluate and quantify the variations in the rate-distortion(RD) performance across different codecs. Many researchers report the average BD value over multiple videos within a dataset for different codecs. We claim that the current practice in the learned video compression community of computing the average BD value over a dataset based on the average RD curve of multiple videos can lead to misleading conclusions. We show both by analysis of a simplistic case of linear RD curves and experimental results with two recent learned video codecs that averaging RD curves can lead to a single video to disproportionately influence the average BD value especially when the operating bitrate range of different codecs do not exactly match. Instead, we advocate for calculating the BD measure per-video basis, as commonly done by the traditional video compression community, followed by averaging the individual BD values over videos, to provide a fair comparison of learned video codecs. Our experimental results demonstrate that the comparison of two recent learned video codecs is affected by how we evaluate the average BD measure.

* Submitted to IEEE ICASSP 2025

Via

Access Paper or Ask Questions

A New Multi-Picture Architecture for Learned Video Deinterlacing and Demosaicing with Parallel Deformable Convolution and Self-Attention Blocks

Apr 19, 2024

Ronglei Ji, A. Murat Tekalp

Figure 1 for A New Multi-Picture Architecture for Learned Video Deinterlacing and Demosaicing with Parallel Deformable Convolution and Self-Attention Blocks

Figure 2 for A New Multi-Picture Architecture for Learned Video Deinterlacing and Demosaicing with Parallel Deformable Convolution and Self-Attention Blocks

Figure 3 for A New Multi-Picture Architecture for Learned Video Deinterlacing and Demosaicing with Parallel Deformable Convolution and Self-Attention Blocks

Figure 4 for A New Multi-Picture Architecture for Learned Video Deinterlacing and Demosaicing with Parallel Deformable Convolution and Self-Attention Blocks

Abstract:Despite the fact real-world video deinterlacing and demosaicing are well-suited to supervised learning from synthetically degraded data because the degradation models are known and fixed, learned video deinterlacing and demosaicing have received much less attention compared to denoising and super-resolution tasks. We propose a new multi-picture architecture for video deinterlacing or demosaicing by aligning multiple supporting pictures with missing data to a reference picture to be reconstructed, benefiting from both local and global spatio-temporal correlations in the feature space using modified deformable convolution blocks and a novel residual efficient top-$k$ self-attention (kSA) block, respectively. Separate reconstruction blocks are used to estimate different types of missing data. Our extensive experimental results, on synthetic or real-world datasets, demonstrate that the proposed novel architecture provides superior results that significantly exceed the state-of-the-art for both tasks in terms of PSNR, SSIM, and perceptual quality. Ablation studies are provided to justify and show the benefit of each novel modification made to the deformable convolution and residual efficient kSA blocks. Code is available: https://github.com/KUIS-AI-Tekalp-Research-Group/Video-Deinterlacing.

* 13 pages, 6 figures, accepted to IMAVIS

Via

Access Paper or Ask Questions

Training Transformer Models by Wavelet Losses Improves Quantitative and Visual Performance in Single Image Super-Resolution

Apr 17, 2024

Cansu Korkmaz, A. Murat Tekalp

Figure 1 for Training Transformer Models by Wavelet Losses Improves Quantitative and Visual Performance in Single Image Super-Resolution

Figure 2 for Training Transformer Models by Wavelet Losses Improves Quantitative and Visual Performance in Single Image Super-Resolution

Figure 3 for Training Transformer Models by Wavelet Losses Improves Quantitative and Visual Performance in Single Image Super-Resolution

Figure 4 for Training Transformer Models by Wavelet Losses Improves Quantitative and Visual Performance in Single Image Super-Resolution

Abstract:Transformer-based models have achieved remarkable results in low-level vision tasks including image super-resolution (SR). However, early Transformer-based approaches that rely on self-attention within non-overlapping windows encounter challenges in acquiring global information. To activate more input pixels globally, hybrid attention models have been proposed. Moreover, training by solely minimizing pixel-wise RGB losses, such as L1, have been found inadequate for capturing essential high-frequency details. This paper presents two contributions: i) We introduce convolutional non-local sparse attention (NLSA) blocks to extend the hybrid transformer architecture in order to further enhance its receptive field. ii) We employ wavelet losses to train Transformer models to improve quantitative and subjective performance. While wavelet losses have been explored previously, showing their power in training Transformer-based SR models is novel. Our experimental results demonstrate that the proposed model provides state-of-the-art PSNR results as well as superior visual performance across various benchmark datasets.

* total of 10 pages including references, 5 tables and 5 figures, accepted for NTIRE 2024 Single Image Super Resolution (x4) challenge

Via

Access Paper or Ask Questions

NTIRE 2024 Challenge on Image Super-Resolution ($\times$4): Methods and Results

Apr 15, 2024

Zheng Chen, Zongwei Wu, Eduard Zamfir, Kai Zhang, Yulun Zhang, Radu Timofte, Xiaokang Yang, Hongyuan Yu, Cheng Wan, Yuxin Hong(+78 more)

$Figure 1 for NTIRE 2024 Challenge on Image Super-Resolution ($\times$4): Methods and Results$

$Figure 2 for NTIRE 2024 Challenge on Image Super-Resolution ($\times$4): Methods and Results$

$Figure 3 for NTIRE 2024 Challenge on Image Super-Resolution ($\times$4): Methods and Results$

$Figure 4 for NTIRE 2024 Challenge on Image Super-Resolution ($\times$4): Methods and Results$

Abstract:This paper reviews the NTIRE 2024 challenge on image super-resolution ($\times$4), highlighting the solutions proposed and the outcomes obtained. The challenge involves generating corresponding high-resolution (HR) images, magnified by a factor of four, from low-resolution (LR) inputs using prior information. The LR images originate from bicubic downsampling degradation. The aim of the challenge is to obtain designs/solutions with the most advanced SR performance, with no constraints on computational resources (e.g., model size and FLOPs) or training data. The track of this challenge assesses performance with the PSNR metric on the DIV2K testing dataset. The competition attracted 199 registrants, with 20 teams submitting valid entries. This collective endeavour not only pushes the boundaries of performance in single-image SR but also offers a comprehensive overview of current trends in this field.

* NTIRE 2024 webpage: https://cvlai.net/ntire/2024. Code: https://github.com/zhengchen1999/NTIRE2024_ImageSR_x4

Via

Access Paper or Ask Questions

PAON: A New Neuron Model using Padé Approximants

Mar 18, 2024

Onur Keleş, A. Murat Tekalp

Figure 1 for PAON: A New Neuron Model using Padé Approximants

Figure 2 for PAON: A New Neuron Model using Padé Approximants

Figure 3 for PAON: A New Neuron Model using Padé Approximants

Figure 4 for PAON: A New Neuron Model using Padé Approximants

Abstract:Convolutional neural networks (CNN) are built upon the classical McCulloch-Pitts neuron model, which is essentially a linear model, where the nonlinearity is provided by a separate activation function. Several researchers have proposed enhanced neuron models, including quadratic neurons, generalized operational neurons, generative neurons, and super neurons, with stronger nonlinearity than that provided by the pointwise activation function. There has also been a proposal to use Pade approximation as a generalized activation function. In this paper, we introduce a brand new neuron model called Pade neurons (Paons), inspired by the Pade approximants, which is the best mathematical approximation of a transcendental function as a ratio of polynomials with different orders. We show that Paons are a super set of all other proposed neuron models. Hence, the basic neuron in any known CNN model can be replaced by Paons. In this paper, we extend the well-known ResNet to PadeNet (built by Paons) to demonstrate the concept. Our experiments on the single-image super-resolution task show that PadeNets can obtain better results than competing architectures.

* Submitted to IEEE ICIP 2024

Via

Access Paper or Ask Questions

Training Generative Image Super-Resolution Models by Wavelet-Domain Losses Enables Better Control of Artifacts

Feb 29, 2024

Cansu Korkmaz, A. Murat Tekalp, Zafer Dogan

Figure 1 for Training Generative Image Super-Resolution Models by Wavelet-Domain Losses Enables Better Control of Artifacts

Figure 2 for Training Generative Image Super-Resolution Models by Wavelet-Domain Losses Enables Better Control of Artifacts

Figure 3 for Training Generative Image Super-Resolution Models by Wavelet-Domain Losses Enables Better Control of Artifacts

Figure 4 for Training Generative Image Super-Resolution Models by Wavelet-Domain Losses Enables Better Control of Artifacts

Abstract:Super-resolution (SR) is an ill-posed inverse problem, where the size of the set of feasible solutions that are consistent with a given low-resolution image is very large. Many algorithms have been proposed to find a "good" solution among the feasible solutions that strike a balance between fidelity and perceptual quality. Unfortunately, all known methods generate artifacts and hallucinations while trying to reconstruct high-frequency (HF) image details. A fundamental question is: Can a model learn to distinguish genuine image details from artifacts? Although some recent works focused on the differentiation of details and artifacts, this is a very challenging problem and a satisfactory solution is yet to be found. This paper shows that the characterization of genuine HF details versus artifacts can be better learned by training GAN-based SR models using wavelet-domain loss functions compared to RGB-domain or Fourier-space losses. Although wavelet-domain losses have been used in the literature before, they have not been used in the context of the SR task. More specifically, we train the discriminator only on the HF wavelet sub-bands instead of on RGB images and the generator is trained by a fidelity loss over wavelet subbands to make it sensitive to the scale and orientation of structures. Extensive experimental results demonstrate that our model achieves better perception-distortion trade-off according to multiple objective measures and visual evaluations.

* Accepted for IEEE CVPR 2024, total of 11 pages, 3 pages for references, 7 figures and 2 tables

Via

Access Paper or Ask Questions