Adaptive video streaming is a key enabler for optimising the delivery of offline encoded video content. The research focus to date has been on optimisation, based solely on rate-quality curves. This paper adds an additional dimension, the energy expenditure, and explores construction of bitrate ladders based on decoding energy-quality curves rather than the conventional rate-quality curves. Pareto fronts are extracted from the rate-quality and energy-quality spaces to select optimal points. Bitrate ladders are constructed from these points using conventional rate-based rules together with a novel quality-based approach. Evaluation on a subset of YouTube-UGC videos encoded with x.265 shows that the energy-quality ladders reduce energy requirements by 28-31% on average at the cost of slightly higher bitrates. The results indicate that optimising based on energy-quality curves rather than rate-quality curves and using quality levels to create the rungs could potentially improve energy efficiency for a comparable quality of experience.
The environmental impact of video streaming services has been discussed as part of the strategies towards sustainable information and communication technologies. A first step towards that is the energy profiling and assessment of energy consumption of existing video technologies. This paper presents a comprehensive study of power measurement techniques in video compression, comparing the use of hardware and software power meters. An experimental methodology to ensure reliability of measurements is introduced. Key findings demonstrate the high correlation of hardware and software based energy measurements for two video codecs across different spatial and temporal resolutions at a lower computational overhead.
The volume of User Generated Content (UGC) has increased in recent years. The challenge with this type of content is assessing its quality. So far, the state-of-the-art metrics are not exhibiting a very high correlation with perceptual quality. In this paper, we explore state-of-the-art metrics that extract/combine natural scene statistics and deep neural network features. We experiment with these by introducing saliency maps to improve perceptibility. We train and test our models using public datasets, namely, YouTube-UGC and KoNViD-1k. Preliminary results indicate that high correlations are achieved by using only deep features while adding saliency is not always boosting the performance. Our results and code will be made publicly available to serve as a benchmark for the research community and can be found on our project page: https://github.com/xinyiW915/SPIE-2023-Supplementary.
Since 2015 video dimensionality has expanded to higher spatial and temporal resolutions and a wider colour gamut. This High Dynamic Range (HDR) content has gained traction in the consumer space as it delivers an enhanced quality of experience. At the same time, the complexity of codecs is growing. This has driven the development of tools for content-adaptive optimisation that achieve optimal rate-distortion performance for HDR video at 4K resolution. While improvements of just a few percentage points in BD-Rate (1-5\%) are significant for the streaming media industry, the impact on subjective quality has been less studied especially for HDR/AV1. In this paper, we conduct a subjective quality assessment (42 subjects) of 4K HDR content with a per-clip optimisation strategy. We correlate these subjective scores with existing popular objective metrics used in standard development and show that some perceptual metrics correlate surprisingly well even though they are not tuned for HDR. We find that the DSQCS protocol is too insensitive to categorically compare the methods but the data allows us to make recommendations about the use of experts vs non-experts in HDR studies, and explain the subjective impact of film grain in HDR content under compression.
Over the past few years, there has been an increase in the demand and availability of High Dynamic Range (HDR) displays and content. To ensure the production of high-quality materials, human evaluation is required. However, ascertaining whether the full playback pipeline is indeed HDR-compliant can be challenging. In this paper, we present a set of recommendations for conformance testing to validate various aspects of the testing workflow, including playback, displays, brightness, colours, and viewing environment. We assessed the effectiveness of HDR conversion techniques used in current standards development (3GPP) for making source materials. Additionally, we evaluate HDR display technologies, including OLED and LCD, using both consumer television and a reference monitor.
The complexity of modern codecs along with the increased need of delivering high-quality videos at low bitrates has reinforced the idea of a per-clip tailoring of parameters for optimised rate-distortion performance. While the objective quality metrics used for Standard Dynamic Range (SDR) videos have been well studied, the transitioning of consumer displays to support High Dynamic Range (HDR) videos, poses a new challenge to rate-distortion optimisation. In this paper, we review the popular HDR metrics DeltaE100 (DE100), PSNRL100, wPSNR, and HDR-VQM. We measure the impact of employing these metrics in per-clip direct search optimisation of the rate-distortion Lagrange multiplier in AV1. We report, on 35 HDR videos, average Bjontegaard Delta Rate (BD-Rate) gains of 4.675%, 2.226%, and 7.253% in terms of DE100, PSNRL100, and HDR-VQM. We also show that the inclusion of chroma in the quality metrics has a significant impact on optimisation, which can only be partially addressed by the use of chroma offsets.
The adoption of video conferencing and video communication services, accelerated by COVID-19, has driven a rapid increase in video data traffic. The demand for higher resolutions and quality, the need for immersive video formats, and the newest, more complex video codecs increase the energy consumption in data centers and display devices. In this paper, we explore and compare the energy consumption across optimized state-of-the-art video codecs, SVT-AV1, VVenC/VVdeC, VP9, and x.265. Furthermore, we align the energy usage with various objective quality metrics and the compression performance for a set of video sequences across different resolutions. The results indicate that from the tested codecs and configurations, SVT-AV1 provides the best tradeoff between energy consumption and quality. The reported results aim to serve as a guide towards sustainable video streaming while not compromising the quality of experience of the end user.
Since the adoption of VP9 by Netflix in 2016, royalty-free coding standards continued to gain prominence through the activities of the AOMedia consortium. AV1, the latest open source standard, is now widely supported. In the early years after standardisation, HDR video tends to be under served in open source encoders for a variety of reasons including the relatively small amount of true HDR content being broadcast and the challenges in RD optimisation with that material. AV1 codec optimisation has been ongoing since 2020 including consideration of the computational load. In this paper, we explore the idea of direct optimisation of the Lagrangian $\lambda$ parameter used in the rate control of the encoders to estimate the optimal Rate-Distortion trade-off achievable for a High Dynamic Range signalled video clip. We show that by adjusting the Lagrange multiplier in the RD optimisation process on a frame-hierarchy basis, we are able to increase the Bjontegaard difference rate gains by more than 3.98$\times$ on average without visually affecting the quality.
In recent years, resolution adaptation based on deep neural networks has enabled significant performance gains for conventional (2D) video codecs. This paper investigates the effectiveness of spatial resolution resampling in the context of immersive content. The proposed approach reduces the spatial resolution of input multi-view videos before encoding, and reconstructs their original resolution after decoding. During the up-sampling process, an advanced CNN model is used to reduce potential re-sampling, compression, and synthesis artifacts. This work has been fully tested with the TMIV coding standard using a Versatile Video Coding (VVC) codec. The results demonstrate that the proposed method achieves a significant rate-quality performance improvement for the majority of the test sequences, with an average BD-VMAF improvement of 3.07 overall sequences.
VMAF is a machine learning based video quality assessment method, originally designed for streaming applications, which combines multiple quality metrics and video features through SVM regression. It offers higher correlation with subjective opinions compared to many conventional quality assessment methods. In this paper we propose enhancements to VMAF through the integration of new video features and alternative quality metrics (selected from a diverse pool) alongside multiple model combination. The proposed combination approach enables training on multiple databases with varying content and distortion characteristics. Our enhanced VMAF method has been evaluated on eight HD video databases, and consistently outperforms the original VMAF model (0.6.1) and other benchmark quality metrics, exhibiting higher correlation with subjective ground truth data.