Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Shan Liu

Corner-to-Center Long-range Context Model for Efficient Learned Image Compression

Nov 29, 2023

Yang Sui, Ding Ding, Xiang Pan, Xiaozhong Xu, Shan Liu, Bo Yuan, Zhenzhong Chen

Figure 1 for Corner-to-Center Long-range Context Model for Efficient Learned Image Compression

Figure 2 for Corner-to-Center Long-range Context Model for Efficient Learned Image Compression

Figure 3 for Corner-to-Center Long-range Context Model for Efficient Learned Image Compression

Figure 4 for Corner-to-Center Long-range Context Model for Efficient Learned Image Compression

Abstract:In the framework of learned image compression, the context model plays a pivotal role in capturing the dependencies among latent representations. To reduce the decoding time resulting from the serial autoregressive context model, the parallel context model has been proposed as an alternative that necessitates only two passes during the decoding phase, thus facilitating efficient image compression in real-world scenarios. However, performance degradation occurs due to its incomplete casual context. To tackle this issue, we conduct an in-depth analysis of the performance degradation observed in existing parallel context models, focusing on two aspects: the Quantity and Quality of information utilized for context prediction and decoding. Based on such analysis, we propose the \textbf{Corner-to-Center transformer-based Context Model (C$^3$M)} designed to enhance context and latent predictions and improve rate-distortion performance. Specifically, we leverage the logarithmic-based prediction order to predict more context features from corner to center progressively. In addition, to enlarge the receptive field in the analysis and synthesis transformation, we use the Long-range Crossing Attention Module (LCAM) in the encoder/decoder to capture the long-range semantic information by assigning the different window shapes in different channels. Extensive experimental evaluations show that the proposed method is effective and outperforms the state-of-the-art parallel methods. Finally, according to the subjective analysis, we suggest that improving the detailed representation in transformer-based image compression is a promising direction to be explored.

Via

Access Paper or Ask Questions

SJTU-TMQA: A quality assessment database for static mesh with texture map

Sep 27, 2023

Bingyang Cui, Qi Yang, Kaifa Yang, Yiling Xu, Xiaozhong Xu, Shan Liu

Abstract:In recent years, static meshes with texture maps have become one of the most prevalent digital representations of 3D shapes in various applications, such as animation, gaming, medical imaging, and cultural heritage applications. However, little research has been done on the quality assessment of textured meshes, which hinders the development of quality-oriented applications, such as mesh compression and enhancement. In this paper, we create a large-scale textured mesh quality assessment database, namely SJTU-TMQA, which includes 21 reference meshes and 945 distorted samples. The meshes are rendered into processed video sequences and then conduct subjective experiments to obtain mean opinion scores (MOS). The diversity of content and accuracy of MOS has been shown to validate its heterogeneity and reliability. The impact of various types of distortion on human perception is demonstrated. 13 state-of-the-art objective metrics are evaluated on SJTU-TMQA. The results report the highest correlation of around 0.6, indicating the need for more effective objective metrics. The SJTU-TMQA is available at https://ccccby.github.io

Via

Access Paper or Ask Questions

GeodesicPSIM: Predicting the Quality of Static Mesh with Texture Map via Geodesic Patch Similarity

Aug 24, 2023

Qi Yang, Joel Jung, Xiaozhong Xu, Shan Liu

Figure 1 for GeodesicPSIM: Predicting the Quality of Static Mesh with Texture Map via Geodesic Patch Similarity

Figure 2 for GeodesicPSIM: Predicting the Quality of Static Mesh with Texture Map via Geodesic Patch Similarity

Figure 3 for GeodesicPSIM: Predicting the Quality of Static Mesh with Texture Map via Geodesic Patch Similarity

Figure 4 for GeodesicPSIM: Predicting the Quality of Static Mesh with Texture Map via Geodesic Patch Similarity

Abstract:Static meshes with texture maps have attracted considerable attention in both industrial manufacturing and academic research, leading to an urgent requirement for effective and robust objective quality evaluation. However, current model-based static mesh quality metrics have obvious limitations: most of them only consider geometry information, while color information is ignored, and they have strict constraints for the meshes' geometrical topology. Other metrics, such as image-based and point-based metrics, are easily influenced by the prepossessing algorithms, e.g., projection and sampling, hampering their ability to perform at their best. In this paper, we propose Geodesic Patch Similarity (GeodesicPSIM), a novel model-based metric to accurately predict human perception quality for static meshes. After selecting a group keypoints, 1-hop geodesic patches are constructed based on both the reference and distorted meshes cleaned by an effective mesh cleaning algorithm. A two-step patch cropping algorithm and a patch texture mapping module refine the size of 1-hop geodesic patches and build the relationship between the mesh geometry and color information, resulting in the generation of 1-hop textured geodesic patches. Three types of features are extracted to quantify the distortion: patch color smoothness, patch discrete mean curvature, and patch pixel color average and variance. To the best of our knowledge, GeodesicPSIM is the first model-based metric especially designed for static meshes with texture maps. GeodesicPSIM provides state-of-the-art performance in comparison with image-based, point-based, and video-based metrics on a newly created and challenging database. We also prove the robustness of GeodesicPSIM by introducing different settings of hyperparameters. Ablation studies also exhibit the effectiveness of three proposed features and the patch cropping algorithm.

Via

Access Paper or Ask Questions

Dynamic Kernel-Based Adaptive Spatial Aggregation for Learned Image Compression

Aug 17, 2023

Huairui Wang, Nianxiang Fu, Zhenzhong Chen, Shan Liu

Figure 1 for Dynamic Kernel-Based Adaptive Spatial Aggregation for Learned Image Compression

Figure 2 for Dynamic Kernel-Based Adaptive Spatial Aggregation for Learned Image Compression

Figure 3 for Dynamic Kernel-Based Adaptive Spatial Aggregation for Learned Image Compression

Figure 4 for Dynamic Kernel-Based Adaptive Spatial Aggregation for Learned Image Compression

Abstract:Learned image compression methods have shown superior rate-distortion performance and remarkable potential compared to traditional compression methods. Most existing learned approaches use stacked convolution or window-based self-attention for transform coding, which aggregate spatial information in a fixed range. In this paper, we focus on extending spatial aggregation capability and propose a dynamic kernel-based transform coding. The proposed adaptive aggregation generates kernel offsets to capture valid information in the content-conditioned range to help transform. With the adaptive aggregation strategy and the sharing weights mechanism, our method can achieve promising transform capability with acceptable model complexity. Besides, according to the recent progress of entropy model, we define a generalized coarse-to-fine entropy model, considering the coarse global context, the channel-wise, and the spatial context. Based on it, we introduce dynamic kernel in hyper-prior to generate more expressive global context. Furthermore, we propose an asymmetric spatial-channel entropy model according to the investigation of the spatial characteristics of the grouped latents. The asymmetric entropy model aims to reduce statistical redundancy while maintaining coding efficiency. Experimental results demonstrate that our method achieves superior rate-distortion performance on three benchmarks compared to the state-of-the-art learning-based methods.

Via

Access Paper or Ask Questions

TDMD: A Database for Dynamic Color Mesh Subjective and Objective Quality Explorations

Aug 03, 2023

Qi Yang, Joel Jung, Timon Deschamps, Xiaozhong Xu, Shan Liu

Figure 1 for TDMD: A Database for Dynamic Color Mesh Subjective and Objective Quality Explorations

Figure 2 for TDMD: A Database for Dynamic Color Mesh Subjective and Objective Quality Explorations

Figure 3 for TDMD: A Database for Dynamic Color Mesh Subjective and Objective Quality Explorations

Figure 4 for TDMD: A Database for Dynamic Color Mesh Subjective and Objective Quality Explorations

Abstract:Dynamic colored meshes (DCM) are widely used in various applications; however, these meshes may undergo different processes, such as compression or transmission, which can distort them and degrade their quality. To facilitate the development of objective metrics for DCMs and study the influence of typical distortions on their perception, we create the Tencent - dynamic colored mesh database (TDMD) containing eight reference DCM objects with six typical distortions. Using processed video sequences (PVS) derived from the DCM, we have conducted a large-scale subjective experiment that resulted in 303 distorted DCM samples with mean opinion scores, making the TDMD the largest available DCM database to our knowledge. This database enabled us to study the impact of different types of distortion on human perception and offer recommendations for DCM compression and related tasks. Additionally, we have evaluated three types of state-of-the-art objective metrics on the TDMD, including image-based, point-based, and video-based metrics, on the TDMD. Our experimental results highlight the strengths and weaknesses of each metric, and we provide suggestions about the selection of metrics in practical DCM applications. The TDMD will be made publicly available at the following location: https://multimedia.tencent.com/resources/tdmd.

Via

Access Paper or Ask Questions

TSMD: A Database for Static Color Mesh Quality Assessment Study

Aug 03, 2023

Qi Yang, Joel Jung, Haiqiang Wang, Xiaozhong Xu, Shan Liu

Abstract:Static meshes with texture map are widely used in modern industrial and manufacturing sectors, attracting considerable attention in the mesh compression community due to its huge amount of data. To facilitate the study of static mesh compression algorithm and objective quality metric, we create the Tencent - Static Mesh Dataset (TSMD) containing 42 reference meshes with rich visual characteristics. 210 distorted samples are generated by the lossy compression scheme developed for the Call for Proposals on polygonal static mesh coding, released on June 23 by the Alliance for Open Media Volumetric Visual Media group. Using processed video sequences, a large-scale, crowdsourcing-based, subjective experiment was conducted to collect subjective scores from 74 viewers. The dataset undergoes analysis to validate its sample diversity and Mean Opinion Scores (MOS) accuracy, establishing its heterogeneous nature and reliability. State-of-the-art objective metrics are evaluated on the new dataset. Pearson and Spearman correlations around 0.75 are reported, deviating from results typically observed on less heterogeneous datasets, demonstrating the need for further development of more robust metrics. The TSMD, including meshes, PVSs, bitstreams, and MOS, is made publicly available at the following location: https://multimedia.tencent.com/resources/tsmd.

Via

Access Paper or Ask Questions

Layer-wise Representation Fusion for Compositional Generalization

Jul 20, 2023

Yafang Zheng, Lei Lin, Zhaohong Lai, Binling Wang, Shan Liu, Biao Fu, Wenhao Rao, Peigen Ye, Yidong Chen, Xiaodong Shi

Figure 1 for Layer-wise Representation Fusion for Compositional Generalization

Figure 2 for Layer-wise Representation Fusion for Compositional Generalization

Figure 3 for Layer-wise Representation Fusion for Compositional Generalization

Figure 4 for Layer-wise Representation Fusion for Compositional Generalization

Abstract:Despite successes across a broad range of applications, sequence-to-sequence models' construct of solutions are argued to be less compositional than human-like generalization. There is mounting evidence that one of the reasons hindering compositional generalization is representations of the encoder and decoder uppermost layer are entangled. In other words, the syntactic and semantic representations of sequences are twisted inappropriately. However, most previous studies mainly concentrate on enhancing token-level semantic information to alleviate the representations entanglement problem, rather than composing and using the syntactic and semantic representations of sequences appropriately as humans do. In addition, we explain why the entanglement problem exists from the perspective of recent studies about training deeper Transformer, mainly owing to the ``shallow'' residual connections and its simple, one-step operations, which fails to fuse previous layers' information effectively. Starting from this finding and inspired by humans' strategies, we propose \textsc{FuSion} (\textbf{Fu}sing \textbf{S}yntactic and Semant\textbf{i}c Representati\textbf{on}s), an extension to sequence-to-sequence models to learn to fuse previous layers' information back into the encoding and decoding process appropriately through introducing a \emph{fuse-attention module} at each encoder and decoder layer. \textsc{FuSion} achieves competitive and even \textbf{state-of-the-art} results on two realistic benchmarks, which empirically demonstrates the effectiveness of our proposal.

* work in progress. arXiv admin note: substantial text overlap with arXiv:2305.12169

Via

Access Paper or Ask Questions

Once-Training-All-Fine: No-Reference Point Cloud Quality Assessment via Domain-relevance Degradation Description

Jul 04, 2023

Yipeng Liu, Qi Yang, Yujie Zhang, Yiling Xu, Le Yang, Xiaozhong Xu, Shan Liu

Figure 1 for Once-Training-All-Fine: No-Reference Point Cloud Quality Assessment via Domain-relevance Degradation Description

Figure 2 for Once-Training-All-Fine: No-Reference Point Cloud Quality Assessment via Domain-relevance Degradation Description

Figure 3 for Once-Training-All-Fine: No-Reference Point Cloud Quality Assessment via Domain-relevance Degradation Description

Figure 4 for Once-Training-All-Fine: No-Reference Point Cloud Quality Assessment via Domain-relevance Degradation Description

Abstract:Full-reference (FR) point cloud quality assessment (PCQA) has achieved impressive progress in recent years. However, as reference point clouds are not available in many cases, no-reference (NR) metrics have become a research hotspot. Existing NR methods suffer from poor generalization performance. To address this shortcoming, we propose a novel NR-PCQA method, Point Cloud Quality Assessment via Domain-relevance Degradation Description (D$^3$-PCQA). First, we demonstrate our model's interpretability by deriving the function of each module using a kernelized ridge regression model. Specifically, quality assessment can be characterized as a leap from the scattered perceptual domain (reflecting subjective perception) to the ordered quality domain (reflecting mean opinion score). Second, to reduce the significant domain discrepancy, we establish an intermediate domain, the description domain, based on insights from subjective experiments, by considering the domain relevance among samples located in the perception domain and learning a structured latent space. The anchor features derived from the learned latent space are generated as cross-domain auxiliary information to promote domain transformation. Furthermore, the newly established description domain decomposes the NR-PCQA problem into two relevant stages. These stages include a classification stage that gives the degradation descriptions to point clouds and a regression stage to determine the confidence degrees of descriptions, providing a semantic explanation for the predicted quality scores. Experimental results demonstrate that D$^3$-PCQA exhibits robust performance and outstanding generalization ability on several publicly available datasets. The code in this work will be publicly available at https://smt.sjtu.edu.cn.

Via

Access Paper or Ask Questions

Reconstruction Distortion of Learned Image Compression with Imperceptible Perturbations

Jun 01, 2023

Yang Sui, Zhuohang Li, Ding Ding, Xiang Pan, Xiaozhong Xu, Shan Liu, Zhenzhong Chen

Figure 1 for Reconstruction Distortion of Learned Image Compression with Imperceptible Perturbations

Figure 2 for Reconstruction Distortion of Learned Image Compression with Imperceptible Perturbations

Figure 3 for Reconstruction Distortion of Learned Image Compression with Imperceptible Perturbations

Figure 4 for Reconstruction Distortion of Learned Image Compression with Imperceptible Perturbations

Abstract:Learned Image Compression (LIC) has recently become the trending technique for image transmission due to its notable performance. Despite its popularity, the robustness of LIC with respect to the quality of image reconstruction remains under-explored. In this paper, we introduce an imperceptible attack approach designed to effectively degrade the reconstruction quality of LIC, resulting in the reconstructed image being severely disrupted by noise where any object in the reconstructed images is virtually impossible. More specifically, we generate adversarial examples by introducing a Frobenius norm-based loss function to maximize the discrepancy between original images and reconstructed adversarial examples. Further, leveraging the insensitivity of high-frequency components to human vision, we introduce Imperceptibility Constraint (IC) to ensure that the perturbations remain inconspicuous. Experiments conducted on the Kodak dataset using various LIC models demonstrate effectiveness. In addition, we provide several findings and suggestions for designing future defenses.

* 7 pages

Via

Access Paper or Ask Questions

Learn to Compose Syntactic and Semantic Representations Appropriately for Compositional Generalization

May 20, 2023

Lei Lin, Shuangtao Li, Biao Fu, Yafang Zheng, Shan Liu, Yidong Chen, Xiaodong Shi

Abstract:Recent studies have shown that sequence-to-sequence (Seq2Seq) models are limited in solving the compositional generalization (CG) tasks, failing to systematically generalize to unseen compositions of seen components. There is mounting evidence that one of the reasons hindering CG is the representation of the encoder uppermost layer is entangled. In other words, the syntactic and semantic representations of sequences are twisted inappropriately. However, most previous studies mainly concentrate on enhancing semantic information at token-level, rather than composing the syntactic and semantic representations of sequences appropriately as humans do. In addition, we consider the representation entanglement problem they found is not comprehensive, and further hypothesize that source keys and values representations passing into different decoder layers are also entangled. Staring from this intuition and inspired by humans' strategies for CG, we propose COMPSITION (Compose Syntactic and Semantic Representations), an extension to Seq2Seq models to learn to compose representations of different encoder layers appropriately for generating different keys and values passing into different decoder layers through introducing a composed layer between the encoder and decoder. COMPSITION achieves competitive and even state-of-the-art results on two realistic benchmarks, which empirically demonstrates the effectiveness of our proposal.

* Work in progress

Via

Access Paper or Ask Questions