Topic:No Reference Image Quality Assessment
What is No Reference Image Quality Assessment? No-reference image quality assessment is the process of evaluating the quality of an image without a reference image for comparison.
Papers and Code
Nov 15, 2024
Abstract:Existing Image Quality Assessment (IQA) methods achieve remarkable success in analyzing quality for overall image, but few works explore quality analysis for Regions of Interest (ROIs). The quality analysis of ROIs can provide fine-grained guidance for image quality improvement and is crucial for scenarios focusing on region-level quality. This paper proposes a novel network, SEAGULL, which can SEe and Assess ROIs quality with GUidance from a Large vision-Language model. SEAGULL incorporates a vision-language model (VLM), masks generated by Segment Anything Model (SAM) to specify ROIs, and a meticulously designed Mask-based Feature Extractor (MFE) to extract global and local tokens for specified ROIs, enabling accurate fine-grained IQA for ROIs. Moreover, this paper constructs two ROI-based IQA datasets, SEAGULL-100w and SEAGULL-3k, for training and evaluating ROI-based IQA. SEAGULL-100w comprises about 100w synthetic distortion images with 33 million ROIs for pre-training to improve the model's ability of regional quality perception, and SEAGULL-3k contains about 3k authentic distortion ROIs to enhance the model's ability to perceive real world distortions. After pre-training on SEAGULL-100w and fine-tuning on SEAGULL-3k, SEAGULL shows remarkable performance on fine-grained ROI quality assessment. Code and datasets are publicly available at the https://github.com/chencn2020/Seagull.
Via

Jan 13, 2025
Abstract:Gaussian Splatting (GS) and Neural Radiance Fields (NeRF) are two groundbreaking technologies that have revolutionized the field of Novel View Synthesis (NVS), enabling immersive photorealistic rendering and user experiences by synthesizing multiple viewpoints from a set of images of sparse views. The potential applications of NVS, such as high-quality virtual and augmented reality, detailed 3D modeling, and realistic medical organ imaging, underscore the importance of quality assessment of NVS methods from the perspective of human perception. Although some previous studies have explored subjective quality assessments for NVS technology, they still face several challenges, especially in NVS methods selection, scenario coverage, and evaluation methodology. To address these challenges, we conducted two subjective experiments for the quality assessment of NVS technologies containing both GS-based and NeRF-based methods, focusing on dynamic and real-world scenes. This study covers 360{\deg}, front-facing, and single-viewpoint videos while providing a richer and greater number of real scenes. Meanwhile, it's the first time to explore the impact of NVS methods in dynamic scenes with moving objects. The two types of subjective experiments help to fully comprehend the influences of different viewing paths from a human perception perspective and pave the way for future development of full-reference and no-reference quality metrics. In addition, we established a comprehensive benchmark of various state-of-the-art objective metrics on the proposed database, highlighting that existing methods still struggle to accurately capture subjective quality. The results give us some insights into the limitations of existing NVS methods and may promote the development of new NVS methods.
Via

Nov 15, 2024
Abstract:Text-to-image generation and text-guided image manipulation have received considerable attention in the field of image generation tasks. However, the mainstream evaluation methods for these tasks have difficulty in evaluating whether all the information from the input text is accurately reflected in the generated images, and they mainly focus on evaluating the overall alignment between the input text and the generated images. This paper proposes new evaluation metrics that assess the alignment between input text and generated images for every individual object. Firstly, according to the input text, chatGPT is utilized to produce questions for the generated images. After that, we use Visual Question Answering(VQA) to measure the relevance of the generated images to the input text, which allows for a more detailed evaluation of the alignment compared to existing methods. In addition, we use Non-Reference Image Quality Assessment(NR-IQA) to evaluate not only the text-image alignment but also the quality of the generated images. Experimental results show that our proposed evaluation approach is the superior metric that can simultaneously assess finer text-image alignment and image quality while allowing for the adjustment of these ratios.
* Accepted to ISCAS2024
Via

Nov 12, 2024
Abstract:Objective: Ensuring the precision in motion tracking for MRI-guided Radiotherapy (MRIgRT) is crucial for the delivery of effective treatments. This study refined the motion tracking accuracy in MRIgRT through the innovation of an automatic real-time tracking method, leveraging an enhanced Tracking-Learning-Detection (ETLD) framework coupled with automatic segmentation. Methods: We developed a novel MRIgRT motion tracking method by integrating two primary methods: the ETLD framework and an improved Chan-Vese model (ICV), named ETLD+ICV. The TLD framework was upgraded to suit real-time cine MRI, including advanced image preprocessing, no-reference image quality assessment, an enhanced median-flow tracker, and a refined detector with dynamic search region adjustments. Additionally, ICV was combined for precise coverage of the target volume, which refined the segmented region frame by frame using tracking results, with key parameters optimized. Tested on 3.5D MRI scans from 10 patients with liver metastases, our method ensures precise tracking and accurate segmentation vital for MRIgRT. Results: An evaluation of 106,000 frames across 77 treatment fractions revealed sub-millimeter tracking errors of less than 0.8mm, with over 99% precision and 98% recall for all subjects, underscoring the robustness and efficacy of the ETLD. Moreover, the ETLD+ICV yielded a dice global score of more than 82% for all subjects, demonstrating the proposed method's extensibility and precise target volume coverage. Conclusions: This study successfully developed an automatic real-time motion tracking method for MRIgRT that markedly surpasses current methods. The novel method not only delivers exceptional precision in tracking and segmentation but also demonstrates enhanced adaptability to clinical demands, positioning it as an indispensable asset in the quest to augment the efficacy of radiotherapy treatments.
Via

Nov 26, 2024
Abstract:Modern reconstruction techniques can effectively model complex 3D scenes from sparse 2D views. However, automatically assessing the quality of novel views and identifying artifacts is challenging due to the lack of ground truth images and the limitations of no-reference image metrics in predicting detailed artifact maps. The absence of such quality metrics hinders accurate predictions of the quality of generated views and limits the adoption of post-processing techniques, such as inpainting, to enhance reconstruction quality. In this work, we propose a new no-reference metric, Puzzle Similarity, which is designed to localize artifacts in novel views. Our approach utilizes image patch statistics from the input views to establish a scene-specific distribution that is later used to identify poorly reconstructed regions in the novel views. We test and evaluate our method in the context of 3D reconstruction; to this end, we collected a novel dataset of human quality assessment in unseen reconstructed views. Through this dataset, we demonstrate that our method can not only successfully localize artifacts in novel views, correlating with human assessment, but do so without direct references. Surprisingly, our metric outperforms both no-reference metrics and popular full-reference image metrics. We can leverage our new metric to enhance applications like automatic image restoration, guided acquisition, or 3D reconstruction from sparse inputs.
Via

Dec 17, 2024
Abstract:Although facial landmark detection (FLD) has gained significant progress, existing FLD methods still suffer from performance drops on partially non-visible faces, such as faces with occlusions or under extreme lighting conditions or poses. To address this issue, we introduce ORFormer, a novel transformer-based method that can detect non-visible regions and recover their missing features from visible parts. Specifically, ORFormer associates each image patch token with one additional learnable token called the messenger token. The messenger token aggregates features from all but its patch. This way, the consensus between a patch and other patches can be assessed by referring to the similarity between its regular and messenger embeddings, enabling non-visible region identification. Our method then recovers occluded patches with features aggregated by the messenger tokens. Leveraging the recovered features, ORFormer compiles high-quality heatmaps for the downstream FLD task. Extensive experiments show that our method generates heatmaps resilient to partial occlusions. By integrating the resultant heatmaps into existing FLD methods, our method performs favorably against the state of the arts on challenging datasets such as WFLW and COFW.
* WACV 2025
Via

Dec 08, 2024
Abstract:Augmented Reality (AR) is a major immersive media technology that enriches our perception of reality by overlaying digital content (the foreground) onto physical environments (the background). It has far-reaching applications, from entertainment and gaming to education, healthcare, and industrial training. Nevertheless, challenges such as visual confusion and classical distortions can result in user discomfort when using the technology. Evaluating AR quality of experience becomes essential to measure user satisfaction and engagement, facilitating the refinement necessary for creating immersive and robust experiences. Though, the scarcity of data and the distinctive characteristics of AR technology render the development of effective quality assessment metrics challenging. This paper presents a deep learning-based objective metric designed specifically for assessing image quality for AR scenarios. The approach entails four key steps, (1) fine-tuning a self-supervised pre-trained vision transformer to extract prominent features from reference images and distilling this knowledge to improve representations of distorted images, (2) quantifying distortions by computing shift representations, (3) employing cross-attention-based decoders to capture perceptual quality features, and (4) integrating regularization techniques and label smoothing to address the overfitting problem. To validate the proposed approach, we conduct extensive experiments on the ARIQA dataset. The results showcase the superior performance of our proposed approach across all model variants, namely TransformAR, TransformAR-KD, and TransformAR-KD+ in comparison to existing state-of-the-art methods.
* Submitted to the IEEE Transactions on Circuits and Systems for Video
Technology
Via

Nov 26, 2024
Abstract:The development of multimodal large language models (MLLMs) enables the evaluation of image quality through natural language descriptions. This advancement allows for more detailed assessments. However, these MLLM-based IQA methods primarily rely on general contextual descriptions, sometimes limiting fine-grained quality assessment. To address this limitation, we introduce a new image quality assessment (IQA) task paradigm, grounding-IQA. This paradigm integrates multimodal referring and grounding with IQA to realize more fine-grained quality perception. Specifically, grounding-IQA comprises two subtasks: grounding-IQA-description (GIQA-DES) and visual question answering (GIQA-VQA). GIQA-DES involves detailed descriptions with precise locations (e.g., bounding boxes), while GIQA-VQA focuses on quality QA for local regions. To realize grounding-IQA, we construct a corresponding dataset, GIQA-160K, through our proposed automated annotation pipeline. Furthermore, we develop a well-designed benchmark, GIQA-Bench. The benchmark comprehensively evaluates the model grounding-IQA performance from three perspectives: description quality, VQA accuracy, and grounding precision. Experiments demonstrate that our proposed task paradigm, dataset, and benchmark facilitate the more fine-grained IQA application. Code: https://github.com/zhengchen1999/Grounding-IQA.
Via

Dec 11, 2024
Abstract:Deep learning-based techniques have potential to optimize scan and post-processing times required for MRI-based fat quantification, but they are constrained by the lack of large training datasets. Generative models are a promising tool to perform data augmentation by synthesizing realistic datasets. However no previous methods have been specifically designed to generate datasets for quantitative MRI (q-MRI) tasks, where reference quantitative maps and large variability in scanning protocols are usually required. We propose a Physics-Informed Latent Diffusion Model (PI-LDM) to synthesize quantitative parameter maps jointly with customizable MR images by incorporating the signal generation model. We assessed the quality of PI-LDM's synthesized data using metrics such as the Fr\'echet Inception Distance (FID), obtaining comparable scores to state-of-the-art generative methods (FID: 0.0459). We also trained a U-Net for the MRI-based fat quantification task incorporating synthetic datasets. When we used a few real (10 subjects, $~200$ slices) and numerous synthetic samples ($>3000$), fat fraction at specific liver ROIs showed a low bias on data obtained using the same protocol than training data ($0.10\%$ at $\hbox{ROI}_1$, $0.12\%$ at $\hbox{ROI}_2$) and on data acquired with an alternative protocol ($0.14\%$ at $\hbox{ROI}_1$, $0.62\%$ at $\hbox{ROI}_2$). Future work will be to extend PI-LDM to other q-MRI applications.
* 10 pages, 7 figures, submitted to IEEE Transactions on Medical
Imaging
Via

Oct 14, 2024
Abstract:This study introduces a novel no-reference image quality metric aimed at assessing image sharpness. Designed to be robust against variations in noise, exposure, contrast, and image content, it measures the normalized decay rate of gradients along pronounced edges, offering an objective method for sharpness evaluation without reference images. Primarily developed for satellite imagery to align with human visual perception of sharpness, this metric supports monitoring and quality characterization of satellite fleets. It demonstrates significant utility and superior performance in consistency with human perception across various image types and operational conditions. Unlike conventional metrics, this heuristic approach provides a way to score images from lower to higher sharpness, making it a reliable and versatile tool for enhancing quality assessment processes without the need for pristine or ground truth comparison. Additionally, this metric is computationally efficient compared to deep learning analysis, ensuring faster and more resource-effective sharpness evaluations.
* 10 pages, 6 figures
Via
