Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Andrey Norkin

Transform and Entropy Coding in AV2

Jan 06, 2026

Alican Nalci, Hilmi E. Egilmez, Madhu P. Krishnan, Keng-Shih Lu, Joe Young, Debargha Mukherjee, Lin Zheng, Jingning Han, Joel Sole, Xin Zhao(+12 more)

Abstract:AV2 is the successor to the AV1 royalty-free video coding standard developed by the Alliance for Open Media (AOMedia). Its primary objective is to deliver substantial compression gains and subjective quality improvements while maintaining low-complexity encoder and decoder operations. This paper describes the transform, quantization and entropy coding design in AV2, including redesigned transform kernels and data-driven transforms, expanded transform partitioning, and a mode & coefficient dependent transform signaling. AV2 introduces several new coding tools including Intra/Inter Secondary Transforms (IST), Trellis Coded Quantization (TCQ), Adaptive Transform Coding (ATC), Probability Adaptation Rate Adjustment (PARA), Forward Skip Coding (FSC), Cross Chroma Component Transforms (CCTX), Parity Hiding (PH) tools and improved lossless coding. These advances enable AV2 to deliver the highest quality video experience for video applications at a significantly reduced bitrate.

Via

Access Paper or Ask Questions

Efficient Per-Shot Convex Hull Prediction By Recurrent Learning

Jun 10, 2022

Somdyuti Paul, Andrey Norkin, Alan C. Bovik

Figure 1 for Efficient Per-Shot Convex Hull Prediction By Recurrent Learning

Figure 2 for Efficient Per-Shot Convex Hull Prediction By Recurrent Learning

Figure 3 for Efficient Per-Shot Convex Hull Prediction By Recurrent Learning

Figure 4 for Efficient Per-Shot Convex Hull Prediction By Recurrent Learning

Abstract:Adaptive video streaming relies on the construction of efficient bitrate ladders to deliver the best possible visual quality to viewers under bandwidth constraints. The traditional method of content dependent bitrate ladder selection requires a video shot to be pre-encoded with multiple encoding parameters to find the optimal operating points given by the convex hull of the resulting rate-quality curves. However, this pre-encoding step is equivalent to an exhaustive search process over the space of possible encoding parameters, which causes significant overhead in terms of both computation and time expenditure. To reduce this overhead, we propose a deep learning based method of content aware convex hull prediction. We employ a recurrent convolutional network (RCN) to implicitly analyze the spatiotemporal complexity of video shots in order to predict their convex hulls. A two-step transfer learning scheme is adopted to train our proposed RCN-Hull model, which ensures sufficient content diversity to analyze scene complexity, while also making it possible capture the scene statistics of pristine source videos. Our experimental results reveal that our proposed model yields better approximations of the optimal convex hulls, and offers competitive time savings as compared to existing approaches. On average, the pre-encoding time was reduced by 58.0% by our method, while the average Bjontegaard delta bitrate (BD-rate) of the predicted convex hulls against ground truth was 0.08%, while the mean absolute deviation of the BD-rate distribution was 0.44%

Via

Access Paper or Ask Questions

Self-Supervised Learning of Perceptually Optimized Block Motion Estimates for Video Compression

Oct 11, 2021

Somdyuti Paul, Andrey Norkin, Alan C. Bovik

Figure 1 for Self-Supervised Learning of Perceptually Optimized Block Motion Estimates for Video Compression

Figure 2 for Self-Supervised Learning of Perceptually Optimized Block Motion Estimates for Video Compression

Figure 3 for Self-Supervised Learning of Perceptually Optimized Block Motion Estimates for Video Compression

Figure 4 for Self-Supervised Learning of Perceptually Optimized Block Motion Estimates for Video Compression

Abstract:Block based motion estimation is integral to inter prediction processes performed in hybrid video codecs. Prevalent block matching based methods that are used to compute block motion vectors (MVs) rely on computationally intensive search procedures. They also suffer from the aperture problem, which can worsen as the block size is reduced. Moreover, the block matching criteria used in typical codecs do not account for the resulting levels of perceptual quality of the motion compensated pictures that are created upon decoding. Towards achieving the elusive goal of perceptually optimized motion estimation, we propose a search-free block motion estimation framework using a multi-stage convolutional neural network, which is able to conduct motion estimation on multiple block sizes simultaneously, using a triplet of frames as input. This composite block translation network (CBT-Net) is trained in a self-supervised manner on a large database that we created from publicly available uncompressed video content. We deploy the multi-scale structural similarity (MS-SSIM) loss function to optimize the perceptual quality of the motion compensated predicted frames. Our experimental results highlight the computational efficiency of our proposed model relative to conventional block matching based motion estimation algorithms, for comparable prediction errors. Further, when used to perform inter prediction in AV1, the MV predictions of the perceptually optimized model result in average Bjontegaard-delta rate (BD-rate) improvements of -1.70% and -1.52% with respect to the MS-SSIM and Video Multi-Method Assessment Fusion (VMAF) quality metrics, respectively as compared to the block matching based motion estimation system employed in the SVT-AV1 encoder.

Via

Access Paper or Ask Questions

Perceptually Optimizing Deep Image Compression

Jul 09, 2020

Li-Heng Chen, Christos G. Bampis, Zhi Li, Andrey Norkin, Alan C. Bovik

Figure 1 for Perceptually Optimizing Deep Image Compression

Figure 2 for Perceptually Optimizing Deep Image Compression

Figure 3 for Perceptually Optimizing Deep Image Compression

Figure 4 for Perceptually Optimizing Deep Image Compression

Abstract:Mean squared error (MSE) and $\ell_p$ norms have largely dominated the measurement of loss in neural networks due to their simplicity and analytical properties. However, when used to assess visual information loss, these simple norms are not highly consistent with human perception. Here, we propose a different proxy approach to optimize image analysis networks against quantitative perceptual models. Specifically, we construct a proxy network, which mimics the perceptual model while serving as a loss layer of the network.We experimentally demonstrate how this optimization framework can be applied to train an end-to-end optimized image compression network. By building on top of a modern deep image compression models, we are able to demonstrate an averaged bitrate reduction of $28.7\%$ over MSE optimization, given a specified perceptual quality (VMAF) level.

* 7 pages, 6 figures. arXiv admin note: substantial text overlap with arXiv:1910.08845

Via

Access Paper or Ask Questions

ProxIQA: A Proxy Approach to Perceptual Optimization of Learned Image Compression

Oct 19, 2019

Li-Heng Chen, Christos G. Bampis, Zhi Li, Andrey Norkin, Alan C. Bovik

Figure 1 for ProxIQA: A Proxy Approach to Perceptual Optimization of Learned Image Compression

Figure 2 for ProxIQA: A Proxy Approach to Perceptual Optimization of Learned Image Compression

Figure 3 for ProxIQA: A Proxy Approach to Perceptual Optimization of Learned Image Compression

Figure 4 for ProxIQA: A Proxy Approach to Perceptual Optimization of Learned Image Compression

Abstract:The use of $\ell_p$ $(p=1,2)$ norms has largely dominated the measurement of loss in neural networks due to their simplicity and analytical properties. However, when used to assess the loss of visual information, these simple norms are not very consistent with human perception. Here, we describe a different "proximal" approach to optimize image analysis networks against quantitative perceptual models. Specifically, we construct a proxy network, broadly termed ProxIQA, which mimics the perceptual model while serving as a loss layer of the network. We experimentally demonstrate how this optimization framework can be applied to train an end-to-end optimized image compression network. By building on top of an existing deep image compression model, we are able to demonstrate a bitrate reduction of as much as $31\%$ over MSE optimization, given a specified perceptual quality (VMAF) level.

* 12 pages, 12 figures, 5 tables

Via

Access Paper or Ask Questions

Speeding up VP9 Intra Encoder with Hierarchical Deep Learning Based Partition Prediction

Jun 15, 2019

Somdyuti Paul, Andrey Norkin, Alan C. Bovik

Figure 1 for Speeding up VP9 Intra Encoder with Hierarchical Deep Learning Based Partition Prediction

Figure 2 for Speeding up VP9 Intra Encoder with Hierarchical Deep Learning Based Partition Prediction

Figure 3 for Speeding up VP9 Intra Encoder with Hierarchical Deep Learning Based Partition Prediction

Figure 4 for Speeding up VP9 Intra Encoder with Hierarchical Deep Learning Based Partition Prediction

Abstract:In VP9 video codec, the sizes of blocks are decided during encoding by recursively partitioning 64$\times$64 superblocks using rate-distortion optimization (RDO). This process is computationally intensive because of the combinatorial search space of possible partitions of a superblock. Here, we propose a deep learning based alternative framework to predict the intra-mode superblock partitions in the form of a four-level partition tree, using a hierarchical fully convolutional network (H-FCN). We created a large database of VP9 superblocks and the corresponding partitions to train an H-FCN model, which was subsequently integrated with the VP9 encoder to reduce the intra-mode encoding time. The experimental results establish that our approach speeds up intra-mode encoding by 69.7% on average, at the expense of a 1.71% increase in the Bjontegaard-Delta bitrate (BD-rate). While VP9 provides several built-in speed levels which are designed to provide faster encoding at the expense of decreased rate-distortion performance, we find that our model is able to outperform the fastest recommended speed level of the reference VP9 encoder for the good quality intra encoding configuration, in terms of both speedup and BD-rate.

Via

Access Paper or Ask Questions