A light-weight super-resolution (LSR) method from a single image targeting mobile applications is proposed in this work. LSR predicts the residual image between the interpolated low-resolution (ILR) and high-resolution (HR) images using a self-supervised framework. To lower the computational complexity, LSR does not adopt the end-to-end optimization deep networks. It consists of three modules: 1) generation of a pool of rich and diversified representations in the neighborhood of a target pixel via unsupervised learning, 2) selecting a subset from the representation pool that is most relevant to the underlying super-resolution task automatically via supervised learning, 3) predicting the residual of the target pixel via regression. LSR has low computational complexity and reasonable model size so that it can be implemented on mobile/edge platforms conveniently. Besides, it offers better visual quality than classical exemplar-based methods in terms of PSNR/SSIM measures.
Many point cloud classification methods are developed under the assumption that all point clouds in the dataset are well aligned with the canonical axes so that the 3D Cartesian point coordinates can be employed to learn features. When input point clouds are not aligned, the classification performance drops significantly. In this work, we focus on a mathematically transparent point cloud classification method called PointHop, analyze its reason for failure due to pose variations, and solve the problem by replacing its pose dependent modules with rotation invariant counterparts. The proposed method is named SO(3)-Invariant PointHop (or S3I-PointHop in short). We also significantly simplify the PointHop pipeline using only one single hop along with multiple spatial aggregation techniques. The idea of exploiting more spatial information is novel. Experiments on the ModelNet40 dataset demonstrate the superiority of S3I-PointHop over traditional PointHop-like methods.
A low-complexity point cloud compression method called the Green Point Cloud Geometry Codec (GPCGC), is proposed to encode the 3D spatial coordinates of static point clouds efficiently. GPCGC consists of two modules. In the first module, point coordinates of input point clouds are hierarchically organized into an octree structure. Points at each leaf node are projected along one of three axes to yield image maps. In the second module, the occupancy map is clustered into 9 modes while the depth map is coded by a low-complexity high-efficiency image codec, called the green image codec (GIC). GIC is a multi-resolution codec based on vector quantization (VQ). Its complexity is significantly lower than HEVC-Intra. Furthermore, the rate-distortion optimization (RDO) technique is used to select the optimal coding parameters. GPCGC is a progressive codec, and it offers a coding performance competitive with MPEG's V-PCC and G-PCC standards at significantly lower complexity.
Cardiac cine magnetic resonance imaging (MRI) has been used to characterize cardiovascular diseases (CVD), often providing a noninvasive phenotyping tool.~While recently flourished deep learning based approaches using cine MRI yield accurate characterization results, the performance is often degraded by small training samples. In addition, many deep learning models are deemed a ``black box," for which models remain largely elusive in how models yield a prediction and how reliable they are. To alleviate this, this work proposes a lightweight successive subspace learning (SSL) framework for CVD classification, based on an interpretable feedforward design, in conjunction with a cardiac atlas. Specifically, our hierarchical SSL model is based on (i) neighborhood voxel expansion, (ii) unsupervised subspace approximation, (iii) supervised regression, and (iv) multi-level feature integration. In addition, using two-phase 3D deformation fields, including end-diastolic and end-systolic phases, derived between the atlas and individual subjects as input offers objective means of assessing CVD, even with small training samples. We evaluate our framework on the ACDC2017 database, comprising one healthy group and four disease groups. Compared with 3D CNN-based approaches, our framework achieves superior classification performance with 140$\times$ fewer parameters, which supports its potential value in clinical use.
A self-supervised adaptive low-light video enhancement (SALVE) method is proposed in this work. SALVE first conducts an effective Retinex-based low-light image enhancement on a few key frames of an input low-light video. Next, it learns mappings from the low- to enhanced-light frames via Ridge regression. Finally, it uses these mappings to enhance the remaining frames in the input video. SALVE is a hybrid method that combines components from a traditional Retinex-based image enhancement method and a learning-based method. The former component leads to a robust solution which is easily adaptive to new real-world environments. The latter component offers a fast, computationally inexpensive and temporally consistent solution. We conduct extensive experiments to show the superior performance of SALVE. Our user study shows that 87% of participants prefer SALVE over prior work.
Despite prolific work on evaluating generative models, little research has been done on the quality evaluation of an individual generated sample. To address this problem, a lightweight generated sample quality evaluation (LGSQE) method is proposed in this work. In the training stage of LGSQE, a binary classifier is trained on real and synthetic samples, where real and synthetic data are labeled by 0 and 1, respectively. In the inference stage, the classifier assigns soft labels (ranging from 0 to 1) to each generated sample. The value of soft label indicates the quality level; namely, the quality is better if its soft label is closer to 0. LGSQE can serve as a post-processing module for quality control. Furthermore, LGSQE can be used to evaluate the performance of generative models, such as accuracy, AUC, precision and recall, by aggregating sample-level quality. Experiments are conducted on CIFAR-10 and MNIST to demonstrate that LGSQE can preserve the same performance rank order as that predicted by the Frechet Inception Distance (FID) but with significantly lower complexity.
Recovering unknown, missing, damaged, distorted or lost information in DCT coefficients is a common task in multiple applications of digital image processing, including image compression, selective image encryption, and image communications. This paper investigates recovery of a special type of information in DCT coefficients of digital images: sign bits. This problem can be modelled as a mixed integer linear programming (MILP) problem, which is NP-hard in general. To efficiently solve the problem, we propose two approximation methods: 1) a relaxation-based method that convert the MILP problem to a linear programming (LP) problem; 2) a divide-and-conquer method which splits the target image into sufficiently small regions, each of which can be more efficiently solved as an MILP problem, and then conducts a global optimization phase as a smaller MILP problem or an LP problem to maximize smoothness across different regions. To the best of our knowledge, we are the first who considered how to use global optimization to recover sign bits of DCT coefficients. We considered how the proposed methods can be applied to JPEG-encoded images and conducted extensive experiments to validate the performances of our proposed methods. The experimental results showed that the proposed methods worked well, especially when the number of unknown sign bits per DCT block is not too large. Compared with other existing methods, which are all based on simple error-concealment strategies, our proposed methods outperformed them with a substantial margin, both according to objective quality metrics (PSNR and SSIM) and also our subjective evaluation. Our work has a number of profound implications, e.g., more sign bits can be discarded to develop more efficient image compression methods, and image encryption methods based on sign bit encryption can be less secure than we previously understood.
Being different from deep-learning-based (DL-based) image generation methods, a new image generative model built upon successive subspace learning principle is proposed and named GenHop (an acronym of Generative PixelHop) in this work. GenHop consists of three modules: 1) high-to-low dimension reduction, 2) seed image generation, and 3) low-to-high dimension expansion. In the first module, it builds a sequence of high-to-low dimensional subspaces through a sequence of whitening processes, each of which contains samples of joint-spatial-spectral representation. In the second module, it generates samples in the lowest dimensional subspace. In the third module, it finds a proper high-dimensional sample for a seed image by adding details back via locally linear embedding (LLE) and a sequence of coloring processes. Experiments show that GenHop can generate visually pleasant images whose FID scores are comparable or even better than those of DL-based generative models for MNIST, Fashion-MNIST and CelebA datasets.
Rapid advances in artificial intelligence (AI) in the last decade have largely been built upon the wide applications of deep learning (DL). However, the high carbon footprint yielded by larger and larger DL networks becomes a concern for sustainability. Furthermore, DL decision mechanism is somewhat obsecure and can only be verified by test data. Green learning (GL) has been proposed as an alternative paradigm to address these concerns. GL is characterized by low carbon footprints, small model sizes, low computational complexity, and logical transparency. It offers energy-effective solutions in cloud centers as well as mobile/edge devices. GL also provides a clear and logical decision-making process to gain people's trust. Several statistical tools have been developed to achieve this goal in recent years. They include subspace approximation, unsupervised and supervised representation learning, supervised discriminant feature selection, and feature space partitioning. We have seen a few successful GL examples with performance comparable with state-of-the-art DL solutions. This paper offers an introduction to GL, its demonstrated applications, and future outlook.
A multi-grid multi-block-size vector quantization (MGBVQ) method is proposed for image coding in this work. The fundamental idea of image coding is to remove correlations among pixels before quantization and entropy coding, e.g., the discrete cosine transform (DCT) and intra predictions, adopted by modern image coding standards. We present a new method to remove pixel correlations. First, by decomposing correlations into long- and short-range correlations, we represent long-range correlations in coarser grids due to their smoothness, thus leading to a multi-grid (MG) coding architecture. Second, we show that short-range correlations can be effectively coded by a suite of vector quantizers (VQs). Along this line, we argue the effectiveness of VQs of very large block sizes and present a convenient way to implement them. It is shown by experimental results that MGBVQ offers excellent rate-distortion (RD) performance, which is comparable with existing image coders, at much lower complexity. Besides, it provides a progressive coded bitstream.