Modern video encoders have evolved into sophisticated pieces of software in which various coding tools interact with each other. In the past, singlepass encoding was not considered for Video-On-Demand (VOD) use cases. In this work, we evaluate production-ready encoders for H.264 (x264), H.265 (HEVC), AV1 (SVT-AV1) along with direct comparisons to the latest AV1 encoder inside NVIDIA GPUs (40 series), and AWS Mediaconvert's AV1 implementation. Our experimental results demonstrate single pass encoding inside modern encoder implementations can give us very good quality at a reasonable compute cost. The results are presented as three different scenarios targeting High, Medium, and Low complexity accounting quality/bitrate/compute load. Finally, a set of recommendations is presented for end-users to help decide which encoder/preset combination might be more suited to their use case.
At practical streaming bitrates, traditional video compression pipelines frequently lead to visible artifacts that degrade perceptual quality. This submission couples the effectiveness of a neural post-processor with a different dynamic optimsation strategy for achieving an improved bitrate/quality compromise. The neural post-processor is refined via adversarial training and employs perceptual loss functions. By optimising the post-processor and encoder directly our method demonstrates significant improvement in video fidelity. The neural post-processor achieves substantial VMAF score increases of +6.72 and +1.81 at bitrates of 50 kb/s and 500 kb/s respectively.
Since 2015 video dimensionality has expanded to higher spatial and temporal resolutions and a wider colour gamut. This High Dynamic Range (HDR) content has gained traction in the consumer space as it delivers an enhanced quality of experience. At the same time, the complexity of codecs is growing. This has driven the development of tools for content-adaptive optimisation that achieve optimal rate-distortion performance for HDR video at 4K resolution. While improvements of just a few percentage points in BD-Rate (1-5\%) are significant for the streaming media industry, the impact on subjective quality has been less studied especially for HDR/AV1. In this paper, we conduct a subjective quality assessment (42 subjects) of 4K HDR content with a per-clip optimisation strategy. We correlate these subjective scores with existing popular objective metrics used in standard development and show that some perceptual metrics correlate surprisingly well even though they are not tuned for HDR. We find that the DSQCS protocol is too insensitive to categorically compare the methods but the data allows us to make recommendations about the use of experts vs non-experts in HDR studies, and explain the subjective impact of film grain in HDR content under compression.
Over the past few years, there has been an increase in the demand and availability of High Dynamic Range (HDR) displays and content. To ensure the production of high-quality materials, human evaluation is required. However, ascertaining whether the full playback pipeline is indeed HDR-compliant can be challenging. In this paper, we present a set of recommendations for conformance testing to validate various aspects of the testing workflow, including playback, displays, brightness, colours, and viewing environment. We assessed the effectiveness of HDR conversion techniques used in current standards development (3GPP) for making source materials. Additionally, we evaluate HDR display technologies, including OLED and LCD, using both consumer television and a reference monitor.
Cloud-based deployment of content production and broadcast workflows has continued to disrupt the industry after the pandemic. The key tools required for unlocking cloud workflows, e.g., transcoding, metadata parsing, and streaming playback, are increasingly commoditized. However, as video traffic continues to increase there is a need to consider tools which offer opportunities for further bitrate/quality gains as well as those which facilitate cloud deployment. In this paper we consider preprocessing, rate/distortion optimisation and cloud cost prediction tools which are only just emerging from the research community. These tools are posed as part of the per-clip optimisation approach to transcoding which has been adopted by large streaming media processing entities but has yet to be made more widely available for the industry.
The complexity of modern codecs along with the increased need of delivering high-quality videos at low bitrates has reinforced the idea of a per-clip tailoring of parameters for optimised rate-distortion performance. While the objective quality metrics used for Standard Dynamic Range (SDR) videos have been well studied, the transitioning of consumer displays to support High Dynamic Range (HDR) videos, poses a new challenge to rate-distortion optimisation. In this paper, we review the popular HDR metrics DeltaE100 (DE100), PSNRL100, wPSNR, and HDR-VQM. We measure the impact of employing these metrics in per-clip direct search optimisation of the rate-distortion Lagrange multiplier in AV1. We report, on 35 HDR videos, average Bjontegaard Delta Rate (BD-Rate) gains of 4.675%, 2.226%, and 7.253% in terms of DE100, PSNRL100, and HDR-VQM. We also show that the inclusion of chroma in the quality metrics has a significant impact on optimisation, which can only be partially addressed by the use of chroma offsets.
Image based Deep Feature Quality Metrics (DFQMs) have been shown to better correlate with subjective perceptual scores over traditional metrics. The fundamental focus of these DFQMs is to exploit internal representations from a large scale classification network as the metric feature space. Previously, no attention has been given to the problem of identifying which layers are most perceptually relevant. In this paper we present a new method for selecting perceptually relevant layers from such a network, based on a neuroscience interpretation of layer behaviour. The selected layers are treated as a hyperparameter to the critic network in a W-GAN. The critic uses the output from these layers in the preliminary stages to extract perceptual information. A video enhancement network is trained adversarially with this critic. Our results show that the introduction of these selected features into the critic yields up to 10% (FID) and 15% (KID) performance increase against other critic networks that do not exploit the idea of optimised feature selection.
This study examines the relationship between H.264 video compression and the performance of an object detection network (YOLOv5). We curated a set of 50 surveillance videos and annotated targets of interest (people, bikes, and vehicles). Videos were encoded at 5 quality levels using Constant Rate Factor (CRF) values in the set {22,32,37,42,47}. YOLOv5 was applied to compressed videos and detection performance was analyzed at each CRF level. Test results indicate that the detection performance is generally robust to moderate levels of compression; using a CRF value of 37 instead of 22 leads to significantly reduced bitrates/file sizes without adversely affecting detection performance. However, detection performance degrades appreciably at higher compression levels, especially in complex scenes with poor lighting and fast-moving targets. Finally, retraining YOLOv5 on compressed imagery gives up to a 1% improvement in F1 score when applied to highly compressed footage.
Since the adoption of VP9 by Netflix in 2016, royalty-free coding standards continued to gain prominence through the activities of the AOMedia consortium. AV1, the latest open source standard, is now widely supported. In the early years after standardisation, HDR video tends to be under served in open source encoders for a variety of reasons including the relatively small amount of true HDR content being broadcast and the challenges in RD optimisation with that material. AV1 codec optimisation has been ongoing since 2020 including consideration of the computational load. In this paper, we explore the idea of direct optimisation of the Lagrangian $\lambda$ parameter used in the rate control of the encoders to estimate the optimal Rate-Distortion trade-off achievable for a High Dynamic Range signalled video clip. We show that by adjusting the Lagrange multiplier in the RD optimisation process on a frame-hierarchy basis, we are able to increase the Bjontegaard difference rate gains by more than 3.98$\times$ on average without visually affecting the quality.
Video transcoding is an increasingly important application in the streaming media industry. It has become important to investigate the optimisation of transcoder parameters for a single clip simply because of the immense number of playbacks for popular clips. In this paper, we explore the use of a canned optimiser to estimate the optimal RD tradeoff achievable for a particular clip. We show that by adjusting the Lagrange multiplier in RD optimisation on keyframes alone we can achieve more than 10$\times$ the previous BD-Rate gains possible without affecting quality for any operating point.