Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Kede Ma

Steerable Pyramid Transform Enables Robust Left Ventricle Quantification

Jan 20, 2022

Xiangyang Zhu, Kede Ma, Wufeng Xue

Figure 1 for Steerable Pyramid Transform Enables Robust Left Ventricle Quantification

Figure 2 for Steerable Pyramid Transform Enables Robust Left Ventricle Quantification

Figure 3 for Steerable Pyramid Transform Enables Robust Left Ventricle Quantification

Figure 4 for Steerable Pyramid Transform Enables Robust Left Ventricle Quantification

Abstract:Although multifarious variants of convolutional neural networks (CNNs) have proved successful in cardiac index quantification, they seem vulnerable to mild input perturbations, e.g., spatial transformations, image distortions, and adversarial attacks. Such brittleness erodes our trust in CNN-based automated diagnosis of various cardiovascular diseases. In this work, we describe a simple and effective method to learn robust CNNs for left ventricle (LV) quantification, including cavity and myocardium areas, directional dimensions, and regional wall thicknesses. The key to the success of our approach is the use of the biologically-inspired steerable pyramid transform (SPT) as fixed front-end processing, which brings three computational advantages to LV quantification. First, the basis functions of SPT match the anatomical structure of the LV as well as the geometric characteristics of the estimated indices. Second, SPT enables sharing a CNN across different orientations as a form of parameter regularization, and explicitly captures the scale variations of the LV in a natural way. Third, the residual highpass subband can be conveniently discarded to further encourage robust feature learning. A concise and effective metric, named Robustness Ratio, is proposed to evaluate the robustness under various input perturbations. Extensive experiments on 145 cardiac sequences show that our SPT-augmented method performs favorably against state-of-the-art algorithms in terms of prediction accuracy, but is significantly more robust under input perturbations.

* 10 pages, 13 figures, journal paper

Via

Access Paper or Ask Questions

Pseudocylindrical Convolutions for Learned Omnidirectional Image Compression

Dec 25, 2021

Mu Li, Kede Ma, Jinxing Li, David Zhang

Figure 1 for Pseudocylindrical Convolutions for Learned Omnidirectional Image Compression

Figure 2 for Pseudocylindrical Convolutions for Learned Omnidirectional Image Compression

Figure 3 for Pseudocylindrical Convolutions for Learned Omnidirectional Image Compression

Figure 4 for Pseudocylindrical Convolutions for Learned Omnidirectional Image Compression

Abstract:Although equirectangular projection (ERP) is a convenient form to store omnidirectional images (also known as 360-degree images), it is neither equal-area nor conformal, thus not friendly to subsequent visual communication. In the context of image compression, ERP will over-sample and deform things and stuff near the poles, making it difficult for perceptually optimal bit allocation. In conventional 360-degree image compression, techniques such as region-wise packing and tiled representation are introduced to alleviate the over-sampling problem, achieving limited success. In this paper, we make one of the first attempts to learn deep neural networks for omnidirectional image compression. We first describe parametric pseudocylindrical representation as a generalization of common pseudocylindrical map projections. A computationally tractable greedy method is presented to determine the (sub)-optimal configuration of the pseudocylindrical representation in terms of a novel proxy objective for rate-distortion performance. We then propose pseudocylindrical convolutions for 360-degree image compression. Under reasonable constraints on the parametric representation, the pseudocylindrical convolution can be efficiently implemented by standard convolution with the so-called pseudocylindrical padding. To demonstrate the feasibility of our idea, we implement an end-to-end 360-degree image compression system, consisting of the learned pseudocylindrical representation, an analysis transform, a non-uniform quantizer, a synthesis transform, and an entropy model. Experimental results on $19,790$ omnidirectional images show that our method achieves consistently better rate-distortion performance than the competing methods. Moreover, the visual quality by our method is significantly improved for all images at all bitrates.

Via

Access Paper or Ask Questions

Image Quality Assessment in the Modern Age

Oct 19, 2021

Kede Ma, Yuming Fang

Abstract:This tutorial provides the audience with the basic theories, methodologies, and current progresses of image quality assessment (IQA). From an actionable perspective, we will first revisit several subjective quality assessment methodologies, with emphasis on how to properly select visual stimuli. We will then present in detail the design principles of objective quality assessment models, supplemented by an in-depth analysis of their advantages and disadvantages. Both hand-engineered and (deep) learning-based methods will be covered. Moreover, the limitations with the conventional model comparison methodology for objective quality models will be pointed out, and novel comparison methodologies such as those based on the theory of "analysis by synthesis" will be introduced. We will last discuss the real-world multimedia applications of IQA, and give a list of open challenging problems, in the hope of encouraging more and more talented researchers and engineers devoting to this exciting and rewarding research field.

* ACM Multimedia 2021 Tutorial

Via

Access Paper or Ask Questions

Locally Adaptive Structure and Texture Similarity for Image Quality Assessment

Oct 16, 2021

Keyan Ding, Yi Liu, Xueyi Zou, Shiqi Wang, Kede Ma

Figure 1 for Locally Adaptive Structure and Texture Similarity for Image Quality Assessment

Figure 2 for Locally Adaptive Structure and Texture Similarity for Image Quality Assessment

Figure 3 for Locally Adaptive Structure and Texture Similarity for Image Quality Assessment

Figure 4 for Locally Adaptive Structure and Texture Similarity for Image Quality Assessment

Abstract:The latest advances in full-reference image quality assessment (IQA) involve unifying structure and texture similarity based on deep representations. The resulting Deep Image Structure and Texture Similarity (DISTS) metric, however, makes rather global quality measurements, ignoring the fact that natural photographic images are locally structured and textured across space and scale. In this paper, we describe a locally adaptive structure and texture similarity index for full-reference IQA, which we term A-DISTS. Specifically, we rely on a single statistical feature, namely the dispersion index, to localize texture regions at different scales. The estimated probability (of one patch being texture) is in turn used to adaptively pool local structure and texture measurements. The resulting A-DISTS is adapted to local image content, and is free of expensive human perceptual scores for supervised training. We demonstrate the advantages of A-DISTS in terms of correlation with human data on ten IQA databases and optimization of single image super-resolution methods.

* Proceedings of the 29th ACM International Conference on Multimedia, 2021

Via

Access Paper or Ask Questions

Perceptually Optimized Deep High-Dynamic-Range Image Tone Mapping

Sep 02, 2021

Chenyang Le, Jiebin Yan, Yuming Fang, Kede Ma

Figure 1 for Perceptually Optimized Deep High-Dynamic-Range Image Tone Mapping

Figure 2 for Perceptually Optimized Deep High-Dynamic-Range Image Tone Mapping

Figure 3 for Perceptually Optimized Deep High-Dynamic-Range Image Tone Mapping

Figure 4 for Perceptually Optimized Deep High-Dynamic-Range Image Tone Mapping

Abstract:We describe a deep high-dynamic-range (HDR) image tone mapping operator that is computationally efficient and perceptually optimized. We first decompose an HDR image into a normalized Laplacian pyramid, and use two deep neural networks (DNNs) to estimate the Laplacian pyramid of the desired tone-mapped image from the normalized representation. We then end-to-end optimize the entire method over a database of HDR images by minimizing the normalized Laplacian pyramid distance (NLPD), a recently proposed perceptual metric. Qualitative and quantitative experiments demonstrate that our method produces images with better visual quality, and runs the fastest among existing local tone mapping algorithms.

* 6 pages, 6 figures, 2 tables

Via

Access Paper or Ask Questions

Task-Specific Normalization for Continual Learning of Blind Image Quality Models

Jul 28, 2021

Weixia Zhang, Kede Ma, Guangtao Zhai, Xiaokang Yang

Figure 1 for Task-Specific Normalization for Continual Learning of Blind Image Quality Models

Figure 2 for Task-Specific Normalization for Continual Learning of Blind Image Quality Models

Figure 3 for Task-Specific Normalization for Continual Learning of Blind Image Quality Models

Figure 4 for Task-Specific Normalization for Continual Learning of Blind Image Quality Models

Abstract:The computational vision community has recently paid attention to continual learning for blind image quality assessment (BIQA). The primary challenge is to combat catastrophic forgetting of previously-seen IQA datasets (i.e., tasks). In this paper, we present a simple yet effective continual learning method for BIQA with improved quality prediction accuracy, plasticity-stability trade-off, and task-order/length robustness. The key step in our approach is to freeze all convolution filters of a pre-trained deep neural network (DNN) for an explicit promise of stability, and learn task-specific normalization parameters for plasticity. We assign each new task a prediction head, and load the corresponding normalization parameters to produce a quality score. The final quality estimate is computed by feature fusion and adaptive weighting using hierarchical representations, without leveraging the test-time oracle. Extensive experiments on six IQA datasets demonstrate the advantages of the proposed method in comparison to previous training techniques for BIQA.

* 12 pages, 6 figures

Via

Access Paper or Ask Questions

Semi-Supervised Deep Ensembles for Blind Image Quality Assessment

Jun 29, 2021

Zhihua Wang, Dingquan Li, Kede Ma

Figure 1 for Semi-Supervised Deep Ensembles for Blind Image Quality Assessment

Figure 2 for Semi-Supervised Deep Ensembles for Blind Image Quality Assessment

Figure 3 for Semi-Supervised Deep Ensembles for Blind Image Quality Assessment

Figure 4 for Semi-Supervised Deep Ensembles for Blind Image Quality Assessment

Abstract:Ensemble methods are generally regarded to be better than a single model if the base learners are deemed to be "accurate" and "diverse." Here we investigate a semi-supervised ensemble learning strategy to produce generalizable blind image quality assessment models. We train a multi-head convolutional network for quality prediction by maximizing the accuracy of the ensemble (as well as the base learners) on labeled data, and the disagreement (i.e., diversity) among them on unlabeled data, both implemented by the fidelity loss. We conduct extensive experiments to demonstrate the advantages of employing unlabeled data for BIQA, especially in model generalization and failure identification.

* 6 pages, 1 figure, 5 tables

Via

Access Paper or Ask Questions

Troubleshooting Blind Image Quality Models in the Wild

May 14, 2021

Zhihua Wang, Haotao Wang, Tianlong Chen, Zhangyang Wang, Kede Ma

Figure 1 for Troubleshooting Blind Image Quality Models in the Wild

Figure 2 for Troubleshooting Blind Image Quality Models in the Wild

Figure 3 for Troubleshooting Blind Image Quality Models in the Wild

Figure 4 for Troubleshooting Blind Image Quality Models in the Wild

Abstract:Recently, the group maximum differentiation competition (gMAD) has been used to improve blind image quality assessment (BIQA) models, with the help of full-reference metrics. When applying this type of approach to troubleshoot "best-performing" BIQA models in the wild, we are faced with a practical challenge: it is highly nontrivial to obtain stronger competing models for efficient failure-spotting. Inspired by recent findings that difficult samples of deep models may be exposed through network pruning, we construct a set of "self-competitors," as random ensembles of pruned versions of the target model to be improved. Diverse failures can then be efficiently identified via self-gMAD competition. Next, we fine-tune both the target and its pruned variants on the human-rated gMAD set. This allows all models to learn from their respective failures, preparing themselves for the next round of self-gMAD competition. Experimental results demonstrate that our method efficiently troubleshoots BIQA models in the wild with improved generalizability.

* 7 pages, 3 tables

Via

Access Paper or Ask Questions

Exposing Semantic Segmentation Failures via Maximum Discrepancy Competition

Mar 03, 2021

Jiebin Yan, Yu Zhong, Yuming Fang, Zhangyang Wang, Kede Ma

Figure 1 for Exposing Semantic Segmentation Failures via Maximum Discrepancy Competition

Figure 2 for Exposing Semantic Segmentation Failures via Maximum Discrepancy Competition

Figure 3 for Exposing Semantic Segmentation Failures via Maximum Discrepancy Competition

Figure 4 for Exposing Semantic Segmentation Failures via Maximum Discrepancy Competition

Abstract:Semantic segmentation is an extensively studied task in computer vision, with numerous methods proposed every year. Thanks to the advent of deep learning in semantic segmentation, the performance on existing benchmarks is close to saturation. A natural question then arises: Does the superior performance on the closed (and frequently re-used) test sets transfer to the open visual world with unconstrained variations? In this paper, we take steps toward answering the question by exposing failures of existing semantic segmentation methods in the open visual world under the constraint of very limited human labeling effort. Inspired by previous research on model falsification, we start from an arbitrarily large image set, and automatically sample a small image set by MAximizing the Discrepancy (MAD) between two segmentation methods. The selected images have the greatest potential in falsifying either (or both) of the two methods. We also explicitly enforce several conditions to diversify the exposed failures, corresponding to different underlying root causes. A segmentation method, whose failures are more difficult to be exposed in the MAD competition, is considered better. We conduct a thorough MAD diagnosis of ten PASCAL VOC semantic segmentation algorithms. With detailed analysis of experimental results, we point out strengths and weaknesses of the competing algorithms, as well as potential research directions for further advancement in semantic segmentation. The codes are publicly available at \url{https://github.com/QTJiebin/MAD_Segmentation}.

* 19 pages, 12 figures, 5 tables, accepted by IJCV

Via

Access Paper or Ask Questions

Continual Learning for Blind Image Quality Assessment

Feb 19, 2021

Weixia Zhang, Dingquan Li, Chao Ma, Guangtao Zhai, Xiaokang Yang, Kede Ma

Figure 1 for Continual Learning for Blind Image Quality Assessment

Figure 2 for Continual Learning for Blind Image Quality Assessment

Figure 3 for Continual Learning for Blind Image Quality Assessment

Figure 4 for Continual Learning for Blind Image Quality Assessment

Abstract:The explosive growth of image data facilitates the fast development of image processing and computer vision methods for emerging visual applications, meanwhile introducing novel distortions to the processed images. This poses a grand challenge to existing blind image quality assessment (BIQA) models, failing to continually adapt to such subpopulation shift. Recent work suggests training BIQA methods on the combination of all available human-rated IQA datasets. However, this type of approach is not scalable to a large number of datasets, and is cumbersome to incorporate a newly created dataset as well. In this paper, we formulate continual learning for BIQA, where a model learns continually from a stream of IQA datasets, building on what was learned from previously seen data. We first identify five desiderata in the new setting with a measure to quantify the plasticity-stability trade-off. We then propose a simple yet effective method for learning BIQA models continually. Specifically, based on a shared backbone network, we add a prediction head for a new dataset, and enforce a regularizer to allow all prediction heads to evolve with new data while being resistant to catastrophic forgetting of old data. We compute the quality score by an adaptive weighted summation of estimates from all prediction heads. Extensive experiments demonstrate the promise of the proposed continual learning method in comparison to standard training techniques for BIQA.

* 14 pages, 6 figures

Via

Access Paper or Ask Questions