Existing quality enhancement methods for compressed images focus on aligning the enhancement domain with the raw domain to yield realistic images. However, these methods exhibit a pervasive enhancement bias towards the compression domain, inadvertently regarding it as more realistic than the raw domain. This bias makes enhanced images closely resemble their compressed counterparts, thus degrading their perceptual quality. In this paper, we propose a simple yet effective method to mitigate this bias and enhance the quality of compressed images. Our method employs a conditional discriminator with the compressed image as a key condition, and then incorporates a domain-divergence regularization to actively distance the enhancement domain from the compression domain. Through this dual strategy, our method enables the discrimination against the compression domain, and brings the enhancement domain closer to the raw domain. Comprehensive quality evaluations confirm the superiority of our method over other state-of-the-art methods without incurring inference overheads.
The success of convolution neural networks (CNN) has been revolutionising the way we approach and use intelligent machines in the Big Data era. Despite success, CNNs have been consistently put under scrutiny owing to their \textit{black-box} nature, an \textit{ad hoc} manner of their construction, together with the lack of theoretical support and physical meanings of their operation. This has been prohibitive to both the quantitative and qualitative understanding of CNNs, and their application in more sensitive areas such as AI for health. We set out to address these issues, and in this way demystify the operation of CNNs, by employing the perspective of matched filtering. We first illuminate that the convolution operation, the very core of CNNs, represents a matched filter which aims to identify the presence of features in input data. This then serves as a vehicle to interpret the convolution-activation-pooling chain in CNNs under the theoretical umbrella of matched filtering, a common operation in signal processing. We further provide extensive examples and experiments to illustrate this connection, whereby the learning in CNNs is shown to also perform matched filtering, which further sheds light onto physical meaning of learnt parameters and layers. It is our hope that this material will provide new insights into the understanding, constructing and analysing of CNNs, as well as paving the way for developing new methods and architectures of CNNs.
Blind visual quality assessment (BVQA) on 360{\textdegree} video plays a key role in optimizing immersive multimedia systems. When assessing the quality of 360{\textdegree} video, human tends to perceive its quality degradation from the viewport-based spatial distortion of each spherical frame to motion artifact across adjacent frames, ending with the video-level quality score, i.e., a progressive quality assessment paradigm. However, the existing BVQA approaches for 360{\textdegree} video neglect this paradigm. In this paper, we take into account the progressive paradigm of human perception towards spherical video quality, and thus propose a novel BVQA approach (namely ProVQA) for 360{\textdegree} video via progressively learning from pixels, frames and video. Corresponding to the progressive learning of pixels, frames and video, three sub-nets are designed in our ProVQA approach, i.e., the spherical perception aware quality prediction (SPAQ), motion perception aware quality prediction (MPAQ) and multi-frame temporal non-local (MFTN) sub-nets. The SPAQ sub-net first models the spatial quality degradation based on spherical perception mechanism of human. Then, by exploiting motion cues across adjacent frames, the MPAQ sub-net properly incorporates motion contextual information for quality assessment on 360{\textdegree} video. Finally, the MFTN sub-net aggregates multi-frame quality degradation to yield the final quality score, via exploring long-term quality correlation from multiple frames. The experiments validate that our approach significantly advances the state-of-the-art BVQA performance on 360{\textdegree} video over two datasets, the code of which has been public in \url{https://github.com/yanglixiaoshen/ProVQA.}
A large class of modern probabilistic learning systems assumes symmetric distributions, however, real-world data tend to obey skewed distributions and are thus not always adequately modelled through symmetric distributions. To address this issue, elliptical distributions are increasingly used to generalise symmetric distributions, and further improvements to skewed elliptical distributions have recently attracted much attention. However, existing approaches are either hard to estimate or have complicated and abstract representations. To this end, we propose to employ the von-Mises-Fisher (vMF) distribution to obtain an explicit and simple probability representation of the skewed elliptical distribution. This is shown not only to allow us to deal with non-symmetric learning systems, but also to provide a physically meaningful way of generalising skewed distributions. For rigour, our extension is proved to share important and desirable properties with its symmetric counterpart. We also demonstrate that the proposed vMF distribution is both easy to generate and stable to estimate, both theoretically and through examples.
In this paper, we aim to address the problem of solving a non-convex optimization problem over an intersection of multiple variable sets. This kind of problems is typically solved by using an alternating minimization (AM) strategy which splits the overall problem into a set of sub-problems corresponding to each variable, and then iteratively performs minimization over each sub-problem using a fixed updating rule. However, due to the intrinsic non-convexity of the overall problem, the optimization can usually be trapped into bad local minimum even when each sub-problem can be globally optimized at each iteration. To tackle this problem, we propose a meta-learning based Global Scope Optimization (GSO) method. It adaptively generates optimizers for sub-problems via meta-learners and constantly updates these meta-learners with respect to the global loss information of the overall problem. Therefore, the sub-problems are optimized with the objective of minimizing the global loss specifically. We evaluate the proposed model on a number of simulations, including solving bi-linear inverse problems: matrix completion, and non-linear problems: Gaussian mixture models. The experimental results show that our proposed approach outperforms AM-based methods in standard settings, and is able to achieve effective optimization in some challenging cases while other methods would typically fail.
Generative adversarial nets (GANs) have become a preferred tool for accommodating complicated distributions, and to stabilise the training and reduce the mode collapse of GANs, one of their main variants employs the integral probability metric (IPM) as the loss function. Although theoretically supported, extensive IPM-GANs are basically comparing moments in an embedded domain of the \textit{critic}. We generalise this by comparing the distributions rather than the moments via a powerful tool, i.e., the characteristic function (CF), which uniquely and universally contains all the information about a distribution. For rigour, we first establish the physical meaning of the phase and amplitude in CFs. This provides a feasible way of manipulating the generation. We then develop an efficient sampling way to calculate the CFs. Within this framework, we further prove an equivalence between the embedded and data domains when a reciprocal exists, which allows us to develop the GAN in an auto-encoder way, by using only two modules to achieve bi-directionally generating clear images. We refer to this efficient structure as the reciprocal CF GAN (RCF-GAN). Experimental results show the superior performances of the proposed RCF-GAN in terms of both generation and reconstruction.
Many modern data analytics applications on graphs operate on domains where graph topology is not known a priori, and hence its determination becomes part of the problem definition, rather than serving as prior knowledge which aids the problem solution. Part III of this monograph starts by addressing ways to learn graph topology, from the case where the physics of the problem already suggest a possible topology, through to most general cases where the graph topology is learned from the data. A particular emphasis is on graph topology definition based on the correlation and precision matrices of the observed data, combined with additional prior knowledge and structural conditions, such as the smoothness or sparsity of graph connections. For learning sparse graphs (with small number of edges), the least absolute shrinkage and selection operator, known as LASSO is employed, along with its graph specific variant, graphical LASSO. For completeness, both variants of LASSO are derived in an intuitive way, and explained. An in-depth elaboration of the graph topology learning paradigm is provided through several examples on physically well defined graphs, such as electric circuits, linear heat transfer, social and computer networks, and spring-mass systems. As many graph neural networks (GNN) and convolutional graph networks (GCN) are emerging, we have also reviewed the main trends in GNNs and GCNs, from the perspective of graph signal filtering. Tensor representation of lattice-structured graphs is next considered, and it is shown that tensors (multidimensional data arrays) are a special class of graph signals, whereby the graph vertices reside on a high-dimensional regular lattice structure. This part of monograph concludes with two emerging applications in financial data processing and underground transportation networks modeling.
This paper studies the problem of estimation for general finite mixture models, with a particular focus on the elliptical mixture models (EMMs). Instead of using the widely adopted Kullback-Leibler divergence, we provide a stable solution to the EMMs that is robust to initialisations and attains superior local optimum by adaptively optimising along a manifold of an approximate Wasserstein distance. More specifically, we first summarise computable and identifiable EMMs, in order to identify the optimisation problem. Due to a probability constraint, solving this problem is cumbersome and unstable, especially under the Wasserstein distance. We thus resort to an efficient optimisation on a statistical manifold defined under an approximate Wasserstein distance, which allows for explicit metrics and operations. This is shown to significantly stabilise and improve the EMM estimations. We also propose an adaptive method to further accelerate the convergence. Experimental results demonstrate excellent performances of the proposed solver.
We propose a new structure for the complex-valued autoencoder by introducing additional degrees of freedom into its design through a widely linear (WL) transform. The corresponding widely linear backpropagation algorithm is also developed using the $\mathbb{CR}$ calculus, to unify the gradient calculation of the cost function and the underlying WL model. More specifically, all the existing complex-valued autoencoders employ the strictly linear transform, which is optimal only when the complex-valued outputs of each network layer are independent of the conjugate of the inputs. In addition, the widely linear model which underpins our work allows us to consider all the second-order statistics of inputs. This provides more freedom in the design and enhanced optimization opportunities, as compared to the state-of-the-art. Furthermore, we show that the most widely adopted cost function, i.e., the mean squared error, is not best suited for the complex domain, as it is a real quantity with a single degree of freedom, while both the phase and the amplitude information need to be optimized. To resolve this issue, we design a new cost function, which is capable of controlling the balance between the phase and the amplitude contribution to the solution. The experimental results verify the superior performance of the proposed autoencoder together with the new cost function, especially for the imaging scenarios where the phase preserves extensive information on edges and shapes.