Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Zhenzhong Chen

UnifiedSSR: A Unified Framework of Sequential Search and Recommendation

Oct 21, 2023

Jiayi Xie, Shang Liu, Gao Cong, Zhenzhong Chen

Figure 1 for UnifiedSSR: A Unified Framework of Sequential Search and Recommendation

Figure 2 for UnifiedSSR: A Unified Framework of Sequential Search and Recommendation

Figure 3 for UnifiedSSR: A Unified Framework of Sequential Search and Recommendation

Figure 4 for UnifiedSSR: A Unified Framework of Sequential Search and Recommendation

Abstract:In this work, we propose a Unified framework of Sequential Search and Recommendation (UnifiedSSR) for joint learning of user behavior history in both search and recommendation scenarios. Specifically, we consider user-interacted products in the recommendation scenario, user-interacted products and user-issued queries in the search scenario as three distinct types of user behaviors. We propose a dual-branch network to encode the pair of interacted product history and issued query history in the search scenario in parallel. This allows for cross-scenario modeling by deactivating the query branch for the recommendation scenario. Through the parameter sharing between dual branches, as well as between product branches in two scenarios, we incorporate cross-view and cross-scenario associations of user behaviors, providing a comprehensive understanding of user behavior patterns. To further enhance user behavior modeling by capturing the underlying dynamic intent, an Intent-oriented Session Modeling module is designed for inferring intent-oriented semantic sessions from the contextual information in behavior sequences. In particular, we consider self-supervised learning signals from two perspectives for intent-oriented semantic session locating, which encourage session discrimination within each behavior sequence and session alignment between dual behavior sequences. Extensive experiments on three public datasets demonstrate that UnifiedSSR consistently outperforms state-of-the-art methods for both search and recommendation.

Via

Access Paper or Ask Questions

Learning Many-to-Many Mapping for Unpaired Real-World Image Super-resolution and Downscaling

Oct 08, 2023

Wanjie Sun, Zhenzhong Chen

Figure 1 for Learning Many-to-Many Mapping for Unpaired Real-World Image Super-resolution and Downscaling

Figure 2 for Learning Many-to-Many Mapping for Unpaired Real-World Image Super-resolution and Downscaling

Figure 3 for Learning Many-to-Many Mapping for Unpaired Real-World Image Super-resolution and Downscaling

Figure 4 for Learning Many-to-Many Mapping for Unpaired Real-World Image Super-resolution and Downscaling

Abstract:Learning based single image super-resolution (SISR) for real-world images has been an active research topic yet a challenging task, due to the lack of paired low-resolution (LR) and high-resolution (HR) training images. Most of the existing unsupervised real-world SISR methods adopt a two-stage training strategy by synthesizing realistic LR images from their HR counterparts first, then training the super-resolution (SR) models in a supervised manner. However, the training of image degradation and SR models in this strategy are separate, ignoring the inherent mutual dependency between downscaling and its inverse upscaling process. Additionally, the ill-posed nature of image degradation is not fully considered. In this paper, we propose an image downscaling and SR model dubbed as SDFlow, which simultaneously learns a bidirectional many-to-many mapping between real-world LR and HR images unsupervisedly. The main idea of SDFlow is to decouple image content and degradation information in the latent space, where content information distribution of LR and HR images is matched in a common latent space. Degradation information of the LR images and the high-frequency information of the HR images are fitted to an easy-to-sample conditional distribution. Experimental results on real-world image SR datasets indicate that SDFlow can generate diverse realistic LR and SR images both quantitatively and qualitatively.

Via

Access Paper or Ask Questions

Dynamic Kernel-Based Adaptive Spatial Aggregation for Learned Image Compression

Aug 17, 2023

Huairui Wang, Nianxiang Fu, Zhenzhong Chen, Shan Liu

Figure 1 for Dynamic Kernel-Based Adaptive Spatial Aggregation for Learned Image Compression

Figure 2 for Dynamic Kernel-Based Adaptive Spatial Aggregation for Learned Image Compression

Figure 3 for Dynamic Kernel-Based Adaptive Spatial Aggregation for Learned Image Compression

Figure 4 for Dynamic Kernel-Based Adaptive Spatial Aggregation for Learned Image Compression

Abstract:Learned image compression methods have shown superior rate-distortion performance and remarkable potential compared to traditional compression methods. Most existing learned approaches use stacked convolution or window-based self-attention for transform coding, which aggregate spatial information in a fixed range. In this paper, we focus on extending spatial aggregation capability and propose a dynamic kernel-based transform coding. The proposed adaptive aggregation generates kernel offsets to capture valid information in the content-conditioned range to help transform. With the adaptive aggregation strategy and the sharing weights mechanism, our method can achieve promising transform capability with acceptable model complexity. Besides, according to the recent progress of entropy model, we define a generalized coarse-to-fine entropy model, considering the coarse global context, the channel-wise, and the spatial context. Based on it, we introduce dynamic kernel in hyper-prior to generate more expressive global context. Furthermore, we propose an asymmetric spatial-channel entropy model according to the investigation of the spatial characteristics of the grouped latents. The asymmetric entropy model aims to reduce statistical redundancy while maintaining coding efficiency. Experimental results demonstrate that our method achieves superior rate-distortion performance on three benchmarks compared to the state-of-the-art learning-based methods.

Via

Access Paper or Ask Questions

JPEG Quantized Coefficient Recovery via DCT Domain Spatial-Frequential Transformer

Aug 17, 2023

Mingyu Ouyang, Zhenzhong Chen

Abstract:JPEG compression adopts the quantization of Discrete Cosine Transform (DCT) coefficients for effective bit-rate reduction, whilst the quantization could lead to a significant loss of important image details. Recovering compressed JPEG images in the frequency domain has attracted more and more attention recently, in addition to numerous restoration approaches developed in the pixel domain. However, the current DCT domain methods typically suffer from limited effectiveness in handling a wide range of compression quality factors, or fall short in recovering sparse quantized coefficients and the components across different colorspace. To address these challenges, we propose a DCT domain spatial-frequential Transformer, named as DCTransformer. Specifically, a dual-branch architecture is designed to capture both spatial and frequential correlations within the collocated DCT coefficients. Moreover, we incorporate the operation of quantization matrix embedding, which effectively allows our single model to handle a wide range of quality factors, and a luminance-chrominance alignment head that produces a unified feature map to align different-sized luminance and chrominance components. Our proposed DCTransformer outperforms the current state-of-the-art JPEG artifact removal techniques, as demonstrated by our extensive experiments.

* 13 pages, 8 figures

Via

Access Paper or Ask Questions

Improving Generalization of Image Captioning with Unsupervised Prompt Learning

Aug 05, 2023

Hongchen Wei, Zhenzhong Chen

Abstract:Pretrained visual-language models have demonstrated impressive zero-shot abilities in image captioning, when accompanied by hand-crafted prompts. Meanwhile, hand-crafted prompts utilize human prior knowledge to guide the model. However, due to the diversity between different domains, such hand-crafted prompt that provide invariant prior knowledge may result in mode collapse for some domains. Some researches attempted to incorporate expert knowledge and instruction datasets, but the results were costly and led to hallucinations. In this paper, we propose an unsupervised prompt learning method to improve Generalization of Image Captioning (GeneIC), which learns a domain-specific prompt vector for the target domain without requiring annotated data. GeneIC aligns visual and language modalities with a pre-trained Contrastive Language-Image Pre-Training (CLIP) model, thus optimizing the domain-specific prompt vector from two aspects: attribute and semantic consistency. Specifically, GeneIC first generates attribute-transferred images with differing attributes, while retaining semantic similarity with original images. Then, GeneIC uses CLIP to measure the similarity between the images and the generated sentences. By exploring the variable and invariant features in the original images and attribute-transferred images, attribute consistency constrains the attribute change direction of both images and sentences to learn domain-specific knowledge. The semantic consistency directly measures the similarity between the generated sentences and images to ensure the accuracy and comprehensiveness of the generated sentences. Consequently, GeneIC only optimizes the prompt vectors, which effectively retains the knowledge in the large model and introduces domain-specific knowledge.

Via

Access Paper or Ask Questions

Reconstruction Distortion of Learned Image Compression with Imperceptible Perturbations

Jun 01, 2023

Yang Sui, Zhuohang Li, Ding Ding, Xiang Pan, Xiaozhong Xu, Shan Liu, Zhenzhong Chen

Figure 1 for Reconstruction Distortion of Learned Image Compression with Imperceptible Perturbations

Figure 2 for Reconstruction Distortion of Learned Image Compression with Imperceptible Perturbations

Figure 3 for Reconstruction Distortion of Learned Image Compression with Imperceptible Perturbations

Figure 4 for Reconstruction Distortion of Learned Image Compression with Imperceptible Perturbations

Abstract:Learned Image Compression (LIC) has recently become the trending technique for image transmission due to its notable performance. Despite its popularity, the robustness of LIC with respect to the quality of image reconstruction remains under-explored. In this paper, we introduce an imperceptible attack approach designed to effectively degrade the reconstruction quality of LIC, resulting in the reconstructed image being severely disrupted by noise where any object in the reconstructed images is virtually impossible. More specifically, we generate adversarial examples by introducing a Frobenius norm-based loss function to maximize the discrepancy between original images and reconstructed adversarial examples. Further, leveraging the insensitivity of high-frequency components to human vision, we introduce Imperceptibility Constraint (IC) to ensure that the perturbations remain inconspicuous. Experiments conducted on the Kodak dataset using various LIC models demonstrate effectiveness. In addition, we provide several findings and suggestions for designing future defenses.

* 7 pages

Via

Access Paper or Ask Questions

Learning a Single Convolutional Layer Model for Low Light Image Enhancement

May 23, 2023

Yuantong Zhang, Baoxin Teng, Daiqin Yang, Zhenzhong Chen, Haichuan Ma, Gang Li, Wenpeng Ding

Figure 1 for Learning a Single Convolutional Layer Model for Low Light Image Enhancement

Figure 2 for Learning a Single Convolutional Layer Model for Low Light Image Enhancement

Figure 3 for Learning a Single Convolutional Layer Model for Low Light Image Enhancement

Figure 4 for Learning a Single Convolutional Layer Model for Low Light Image Enhancement

Abstract:Low-light image enhancement (LLIE) aims to improve the illuminance of images due to insufficient light exposure. Recently, various lightweight learning-based LLIE methods have been proposed to handle the challenges of unfavorable prevailing low contrast, low brightness, etc. In this paper, we have streamlined the architecture of the network to the utmost degree. By utilizing the effective structural re-parameterization technique, a single convolutional layer model (SCLM) is proposed that provides global low-light enhancement as the coarsely enhanced results. In addition, we introduce a local adaptation module that learns a set of shared parameters to accomplish local illumination correction to address the issue of varied exposure levels in different image regions. Experimental results demonstrate that the proposed method performs favorably against the state-of-the-art LLIE methods in both objective metrics and subjective visual effects. Additionally, our method has fewer parameters and lower inference complexity compared to other learning-based schemes.

Via

Access Paper or Ask Questions

LaMD: Latent Motion Diffusion for Video Generation

Apr 23, 2023

Yaosi Hu, Zhenzhong Chen, Chong Luo

Abstract:Generating coherent and natural movement is the key challenge in video generation. This research proposes to condense video generation into a problem of motion generation, to improve the expressiveness of motion and make video generation more manageable. This can be achieved by breaking down the video generation process into latent motion generation and video reconstruction. We present a latent motion diffusion (LaMD) framework, which consists of a motion-decomposed video autoencoder and a diffusion-based motion generator, to implement this idea. Through careful design, the motion-decomposed video autoencoder can compress patterns in movement into a concise latent motion representation. Meanwhile, the diffusion-based motion generator is able to efficiently generate realistic motion on a continuous latent space under multi-modal conditions, at a cost that is similar to that of image diffusion models. Results show that LaMD generates high-quality videos with a wide range of motions, from stochastic dynamics to highly controllable movements. It achieves new state-of-the-art performance on benchmark datasets, including BAIR, Landscape and CATER-GENs, for Image-to-Video (I2V) and Text-Image-to-Video (TI2V) generation. The source code of LaMD will be made available soon.

Via

Access Paper or Ask Questions

Continuous Space-Time Video Super-Resolution Utilizing Long-Range Temporal Information

Feb 26, 2023

Yuantong Zhang, Daiqin Yang, Zhenzhong Chen, Wenpeng Ding

Abstract:In this paper, we consider the task of space-time video super-resolution (ST-VSR), namely, expanding a given source video to a higher frame rate and resolution simultaneously. However, most existing schemes either consider a fixed intermediate time and scale in the training stage or only accept a preset number of input frames (e.g., two adjacent frames) that fails to exploit long-range temporal information. To address these problems, we propose a continuous ST-VSR (C-STVSR) method that can convert the given video to any frame rate and spatial resolution. To achieve time-arbitrary interpolation, we propose a forward warping guided frame synthesis module and an optical-flow-guided context consistency loss to better approximate extreme motion and preserve similar structures among input and prediction frames. In addition, we design a memory-friendly cascading depth-to-space module to realize continuous spatial upsampling. Meanwhile, with the sophisticated reorganization of optical flow, the proposed method is memory friendly, making it possible to propagate information from long-range neighboring frames and achieve better reconstruction quality. Extensive experiments show that the proposed algorithm has good flexibility and achieves better performance on various datasets compared with the state-of-the-art methods in both objective evaluations and subjective visual effects.

Via

Access Paper or Ask Questions

Mutually-Regularized Dual Collaborative Variational Auto-encoder for Recommendation Systems

Nov 21, 2022

Yaochen Zhu, Zhenzhong Chen

Figure 1 for Mutually-Regularized Dual Collaborative Variational Auto-encoder for Recommendation Systems

Figure 2 for Mutually-Regularized Dual Collaborative Variational Auto-encoder for Recommendation Systems

Figure 3 for Mutually-Regularized Dual Collaborative Variational Auto-encoder for Recommendation Systems

Figure 4 for Mutually-Regularized Dual Collaborative Variational Auto-encoder for Recommendation Systems

Abstract:Recently, user-oriented auto-encoders (UAEs) have been widely used in recommender systems to learn semantic representations of users based on their historical ratings. However, since latent item variables are not modeled in UAE, it is difficult to utilize the widely available item content information when ratings are sparse. In addition, whenever new items arrive, we need to wait for collecting rating data for these items and retrain the UAE from scratch, which is inefficient in practice. Aiming to address the above two problems simultaneously, we propose a mutually-regularized dual collaborative variational auto-encoder (MD-CVAE) for recommendation. First, by replacing randomly initialized last layer weights of the vanilla UAE with stacked latent item embeddings, MD-CVAE integrates two heterogeneous information sources, i.e., item content and user ratings, into the same principled variational framework where the weights of UAE are regularized by item content such that convergence to a non-optima due to data sparsity can be avoided. In addition, the regularization is mutual in that user ratings can also help the dual item content module learn more recommendation-oriented item content embeddings. Finally, we propose a symmetric inference strategy for MD-CVAE where the first layer weights of the UAE encoder are tied to the latent item embeddings of the UAE decoder. Through this strategy, no retraining is required to recommend newly introduced items. Empirical studies show the effectiveness of MD-CVAE in both normal and cold-start scenarios. Codes are available at https://github.com/yaochenzhu/MD-CVAE.

Via

Access Paper or Ask Questions