Rate-control is essential to ensure efficient video delivery. Typical rate-control algorithms rely on bit allocation strategies, to appropriately distribute bits among frames. As reference frames are essential for exploiting temporal redundancies, intra frames are usually assigned a larger portion of the available bits. In this paper, an accurate method to estimate number of bits and quality of intra frames is proposed, which can be used for bit allocation in a rate-control scheme. The algorithm is based on deep learning, where networks are trained using the original frames as inputs, while distortions and sizes of compressed frames after encoding are used as ground truths. Two approaches are proposed where either local or global distortions are predicted.
In this work recent advances in conditional adversarial networks are investigated to develop an end-to-end architecture based on Convolutional Neural Networks (CNNs) to directly map realistic colours to an input greyscale image. Observing that existing colourisation methods sometimes exhibit a lack of colourfulness, this paper proposes a method to improve colourisation results. In particular, the method uses Generative Adversarial Neural Networks (GANs) and focuses on improvement of training stability to enable better generalisation in large multi-class image datasets. Additionally, the integration of instance and batch normalisation layers in both generator and discriminator is introduced to the popular U-Net architecture, boosting the network capabilities to generalise the style changes of the content. The method has been tested using the ILSVRC 2012 dataset, achieving improved automatic colourisation results compared to other methods based on GANs.
This paper proposes a method for complexity reduction in practical video encoders using multiple decision tree classifiers. The method is demonstrated for the fast implementation of the 'High Efficiency Video Coding' (HEVC) standard, chosen because of its high bit rate reduction capability but large complexity overhead. Optimal partitioning of each video frame into coding units (CUs) is the main source of complexity as a vast number of combinations are tested. The decision tree models were trained to identify when the CU testing process, a time-consuming Lagrangian optimisation, can be skipped i.e a high probability that the CU can remain whole. A novel approach to finding the simplest and most effective decision tree model called 'manual pruning' is described. Implementing the skip criteria reduced the average encoding time by 42.1% for a Bj{\o}ntegaard Delta rate detriment of 0.7%, for 17 standard test sequences in a range of resolutions and quantisation parameters.
Video Quality Assessment (VQA) is a very challenging task due to its highly subjective nature. Moreover, many factors influence VQA. Compression of video content, while necessary for minimising transmission and storage requirements, introduces distortions which can have detrimental effects on the perceived quality. Especially when dealing with modern video coding standards, it is extremely difficult to model the effects of compression due to the unpredictability of encoding on different content types. Moreover, transmission also introduces delays and other distortion types which affect the perceived quality. Therefore, it would be highly beneficial to accurately predict the perceived quality of video to be distributed over modern content distribution platforms, so that specific actions could be undertaken to maximise the Quality of Experience (QoE) of the users. Traditional VQA techniques based on feature extraction and modelling may not be sufficiently accurate. In this paper, a novel Deep Learning (DL) framework is introduced for effectively predicting VQA of video content delivery mechanisms based on end-to-end feature learning. The proposed framework is based on Convolutional Neural Networks, taking into account compression distortion as well as transmission delays. Training and evaluation of the proposed framework are performed on a user annotated VQA dataset specifically created to undertake this work. The experiments show that the proposed methods can lead to high accuracy of the quality estimation, showcasing the potential of using DL in complex VQA scenarios.