Abstract:The quality assessment of AI-generated content (AIGC) faces multi-dimensional challenges, that span from low-level visual perception to high-level semantic understanding. Existing methods generally rely on single-level visual features, limiting their ability to capture complex distortions in AIGC images. To address this limitation, a multi-level visual representation paradigm is proposed with three stages, namely multi-level feature extraction, hierarchical fusion, and joint aggregation. Based on this paradigm, two networks are developed. Specifically, the Multi-Level Global-Local Fusion Network (MGLF-Net) is designed for the perceptual quality assessment, extracting complementary local and global features via dual CNN and Transformer visual backbones. The Multi-Level Prompt-Embedded Fusion Network (MPEF-Net) targets Text-to-Image correspondence by embedding prompt semantics into the visual feature fusion process at each feature level. The fused multi-level features are then aggregated for final evaluation. Experiments on benchmarks demonstrate outstanding performance on both tasks, validating the effectiveness of the proposed multi-level visual assessment paradigm.
Abstract:No-reference bitstream-layer point cloud quality assessment (PCQA) can be deployed without full decoding at any network node to achieve real-time quality monitoring. In this work, we develop the first PCQA model dedicated to Trisoup-Lifting encoded 3D point clouds by analyzing bitstreams without full decoding. Specifically, we investigate the relationship among texture bitrate per point (TBPP), texture complexity (TC) and texture quantization parameter (TQP) while geometry encoding is lossless. Subsequently, we estimate TC by utilizing TQP and TBPP. Then, we establish a texture distortion evaluation model based on TC, TBPP and TQP. Ultimately, by integrating this texture distortion model with a geometry attenuation factor, a function of trisoupNodeSizeLog2 (tNSL), we acquire a comprehensive NR bitstream-layer PCQA model named streamPCQ-TL. In addition, this work establishes a database named WPC6.0, the first and largest PCQA database dedicated to Trisoup-Lifting encoding mode, encompassing 400 distorted point clouds with both 4 geometric multiplied by 5 texture distortion levels. Experiment results on M-PCCD, ICIP2020 and the proposed WPC6.0 database suggest that the proposed streamPCQ-TL model exhibits robust and notable performance in contrast to existing advanced PCQA metrics, particularly in terms of computational cost. The dataset and source code will be publicly released at \href{https://github.com/qdushl/Waterloo-Point-Cloud-Database-6.0}{\textit{https://github.com/qdushl/Waterloo-Point-Cloud-Database-6.0}}