Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Xin Yuan

Equal contributions

Enhancing User-Centric Privacy Protection: An Interactive Framework through Diffusion Models and Machine Unlearning

Sep 05, 2024

Huaxi Huang, Xin Yuan, Qiyu Liao, Dadong Wang, Tongliang Liu

Figure 1 for Enhancing User-Centric Privacy Protection: An Interactive Framework through Diffusion Models and Machine Unlearning

Figure 2 for Enhancing User-Centric Privacy Protection: An Interactive Framework through Diffusion Models and Machine Unlearning

Figure 3 for Enhancing User-Centric Privacy Protection: An Interactive Framework through Diffusion Models and Machine Unlearning

Figure 4 for Enhancing User-Centric Privacy Protection: An Interactive Framework through Diffusion Models and Machine Unlearning

Abstract:In the realm of multimedia data analysis, the extensive use of image datasets has escalated concerns over privacy protection within such data. Current research predominantly focuses on privacy protection either in data sharing or upon the release of trained machine learning models. Our study pioneers a comprehensive privacy protection framework that safeguards image data privacy concurrently during data sharing and model publication. We propose an interactive image privacy protection framework that utilizes generative machine learning models to modify image information at the attribute level and employs machine unlearning algorithms for the privacy preservation of model parameters. This user-interactive framework allows for adjustments in privacy protection intensity based on user feedback on generated images, striking a balance between maximal privacy safeguarding and maintaining model performance. Within this framework, we instantiate two modules: a differential privacy diffusion model for protecting attribute information in images and a feature unlearning algorithm for efficient updates of the trained model on the revised image dataset. Our approach demonstrated superiority over existing methods on facial datasets across various attribute classifications.

Via

Access Paper or Ask Questions

Towards Real-time Video Compressive Sensing on Mobile Devices

Aug 14, 2024

Miao Cao, Lishun Wang, Huan Wang, Guoqing Wang, Xin Yuan

Figure 1 for Towards Real-time Video Compressive Sensing on Mobile Devices

Figure 2 for Towards Real-time Video Compressive Sensing on Mobile Devices

Figure 3 for Towards Real-time Video Compressive Sensing on Mobile Devices

Figure 4 for Towards Real-time Video Compressive Sensing on Mobile Devices

Abstract:Video Snapshot Compressive Imaging (SCI) uses a low-speed 2D camera to capture high-speed scenes as snapshot compressed measurements, followed by a reconstruction algorithm to retrieve the high-speed video frames. The fast evolving mobile devices and existing high-performance video SCI reconstruction algorithms motivate us to develop mobile reconstruction methods for real-world applications. Yet, it is still challenging to deploy previous reconstruction algorithms on mobile devices due to the complex inference process, let alone real-time mobile reconstruction. To the best of our knowledge, there is no video SCI reconstruction model designed to run on the mobile devices. Towards this end, in this paper, we present an effective approach for video SCI reconstruction, dubbed MobileSCI, which can run at real-time speed on the mobile devices for the first time. Specifically, we first build a U-shaped 2D convolution-based architecture, which is much more efficient and mobile-friendly than previous state-of-the-art reconstruction methods. Besides, an efficient feature mixing block, based on the channel splitting and shuffling mechanisms, is introduced as a novel bottleneck block of our proposed MobileSCI to alleviate the computational burden. Finally, a customized knowledge distillation strategy is utilized to further improve the reconstruction quality. Extensive results on both simulated and real data show that our proposed MobileSCI can achieve superior reconstruction quality with high efficiency on the mobile devices. Particularly, we can reconstruct a 256 X 256 X 8 snapshot compressed measurement with real-time performance (about 35 FPS) on an iPhone 15. Code is available at https://github.com/mcao92/MobileSCI.

* 9 pages, Accepted by ACM MM 2024

Via

Access Paper or Ask Questions

A Simple Low-bit Quantization Framework for Video Snapshot Compressive Imaging

Jul 31, 2024

Miao Cao, Lishun Wang, Huan Wang, Xin Yuan

Figure 1 for A Simple Low-bit Quantization Framework for Video Snapshot Compressive Imaging

Figure 2 for A Simple Low-bit Quantization Framework for Video Snapshot Compressive Imaging

Figure 3 for A Simple Low-bit Quantization Framework for Video Snapshot Compressive Imaging

Figure 4 for A Simple Low-bit Quantization Framework for Video Snapshot Compressive Imaging

Abstract:Video Snapshot Compressive Imaging (SCI) aims to use a low-speed 2D camera to capture high-speed scene as snapshot compressed measurements, followed by a reconstruction algorithm to reconstruct the high-speed video frames. State-of-the-art (SOTA) deep learning-based algorithms have achieved impressive performance, yet with heavy computational workload. Network quantization is a promising way to reduce computational cost. However, a direct low-bit quantization will bring large performance drop. To address this challenge, in this paper, we propose a simple low-bit quantization framework (dubbed Q-SCI) for the end-to-end deep learning-based video SCI reconstruction methods which usually consist of a feature extraction, feature enhancement, and video reconstruction module. Specifically, we first design a high-quality feature extraction module and a precise video reconstruction module to extract and propagate high-quality features in the low-bit quantized model. In addition, to alleviate the information distortion of the Transformer branch in the quantized feature enhancement module, we introduce a shift operation on the query and key distributions to further bridge the performance gap. Comprehensive experimental results manifest that our Q-SCI framework can achieve superior performance, e.g., 4-bit quantized EfficientSCI-S derived by our Q-SCI framework can theoretically accelerate the real-valued EfficientSCI-S by 7.8X with only 2.3% performance gap on the simulation testing datasets. Code is available at https://github.com/mcao92/QuantizedSCI.

* 18 pages, Accepted by ECCV 2024

Via

Access Paper or Ask Questions

Hierarchical Separable Video Transformer for Snapshot Compressive Imaging

Jul 16, 2024

Ping Wang, Yulun Zhang, Lishun Wang, Xin Yuan

Figure 1 for Hierarchical Separable Video Transformer for Snapshot Compressive Imaging

Figure 2 for Hierarchical Separable Video Transformer for Snapshot Compressive Imaging

Figure 3 for Hierarchical Separable Video Transformer for Snapshot Compressive Imaging

Figure 4 for Hierarchical Separable Video Transformer for Snapshot Compressive Imaging

Abstract:Transformers have achieved the state-of-the-art performance on solving the inverse problem of Snapshot Compressive Imaging (SCI) for video, whose ill-posedness is rooted in the mixed degradation of spatial masking and temporal aliasing. However, previous Transformers lack an insight into the degradation and thus have limited performance and efficiency. In this work, we tailor an efficient reconstruction architecture without temporal aggregation in early layers and Hierarchical Separable Video Transformer (HiSViT) as building block. HiSViT is built by multiple groups of Cross-Scale Separable Multi-head Self-Attention (CSS-MSA) and Gated Self-Modulated Feed-Forward Network (GSM-FFN) with dense connections, each of which is conducted within a separate channel portions at a different scale, for multi-scale interactions and long-range modeling. By separating spatial operations from temporal ones, CSS-MSA introduces an inductive bias of paying more attention within frames instead of between frames while saving computational overheads. GSM-FFN is design to enhance the locality via gated mechanism and factorized spatial-temporal convolutions. Extensive experiments demonstrate that our method outperforms previous methods by $>\!0.5$ dB with comparable or fewer complexity and parameters. The source codes and pretrained models are released at https://github.com/pwangcs/HiSViT.

* Accepted by ECCV 2024

Via

Access Paper or Ask Questions

Learning Autonomous Race Driving with Action Mapping Reinforcement Learning

Jun 21, 2024

Yuanda Wang, Xin Yuan, Changyin Sun

Figure 1 for Learning Autonomous Race Driving with Action Mapping Reinforcement Learning

Figure 2 for Learning Autonomous Race Driving with Action Mapping Reinforcement Learning

Figure 3 for Learning Autonomous Race Driving with Action Mapping Reinforcement Learning

Figure 4 for Learning Autonomous Race Driving with Action Mapping Reinforcement Learning

Abstract:Autonomous race driving poses a complex control challenge as vehicles must be operated at the edge of their handling limits to reduce lap times while respecting physical and safety constraints. This paper presents a novel reinforcement learning (RL)-based approach, incorporating the action mapping (AM) mechanism to manage state-dependent input constraints arising from limited tire-road friction. A numerical approximation method is proposed to implement AM, addressing the complex dynamics associated with the friction constraints. The AM mechanism also allows the learned driving policy to be generalized to different friction conditions. Experimental results in our developed race simulator demonstrate that the proposed AM-RL approach achieves superior lap times and better success rates compared to the conventional RL-based approaches. The generalization capability of driving policy with AM is also validated in the experiments.

Via

Access Paper or Ask Questions

Coarse-Fine Spectral-Aware Deformable Convolution For Hyperspectral Image Reconstruction

Jun 18, 2024

Jincheng Yang, Lishun Wang, Miao Cao, Huan Wang, Yinping Zhao, Xin Yuan

Figure 1 for Coarse-Fine Spectral-Aware Deformable Convolution For Hyperspectral Image Reconstruction

Figure 2 for Coarse-Fine Spectral-Aware Deformable Convolution For Hyperspectral Image Reconstruction

Figure 3 for Coarse-Fine Spectral-Aware Deformable Convolution For Hyperspectral Image Reconstruction

Figure 4 for Coarse-Fine Spectral-Aware Deformable Convolution For Hyperspectral Image Reconstruction

Abstract:We study the inverse problem of Coded Aperture Snapshot Spectral Imaging (CASSI), which captures a spatial-spectral data cube using snapshot 2D measurements and uses algorithms to reconstruct 3D hyperspectral images (HSI). However, current methods based on Convolutional Neural Networks (CNNs) struggle to capture long-range dependencies and non-local similarities. The recently popular Transformer-based methods are poorly deployed on downstream tasks due to the high computational cost caused by self-attention. In this paper, we propose Coarse-Fine Spectral-Aware Deformable Convolution Network (CFSDCN), applying deformable convolutional networks (DCN) to this task for the first time. Considering the sparsity of HSI, we design a deformable convolution module that exploits its deformability to capture long-range dependencies and non-local similarities. In addition, we propose a new spectral information interaction module that considers both coarse-grained and fine-grained spectral similarities. Extensive experiments demonstrate that our CFSDCN significantly outperforms previous state-of-the-art (SOTA) methods on both simulated and real HSI datasets.

* 7 pages, 5 figures, Accepted by ICIP2024

Via

Access Paper or Ask Questions

Generative Lifting of Multiview to 3D from Unknown Pose: Wrapping NeRF inside Diffusion

Jun 11, 2024

Xin Yuan, Rana Hanocka, Michael Maire

Figure 1 for Generative Lifting of Multiview to 3D from Unknown Pose: Wrapping NeRF inside Diffusion

Figure 2 for Generative Lifting of Multiview to 3D from Unknown Pose: Wrapping NeRF inside Diffusion

Figure 3 for Generative Lifting of Multiview to 3D from Unknown Pose: Wrapping NeRF inside Diffusion

Figure 4 for Generative Lifting of Multiview to 3D from Unknown Pose: Wrapping NeRF inside Diffusion

Abstract:We cast multiview reconstruction from unknown pose as a generative modeling problem. From a collection of unannotated 2D images of a scene, our approach simultaneously learns both a network to predict camera pose from 2D image input, as well as the parameters of a Neural Radiance Field (NeRF) for the 3D scene. To drive learning, we wrap both the pose prediction network and NeRF inside a Denoising Diffusion Probabilistic Model (DDPM) and train the system via the standard denoising objective. Our framework requires the system accomplish the task of denoising an input 2D image by predicting its pose and rendering the NeRF from that pose. Learning to denoise thus forces the system to concurrently learn the underlying 3D NeRF representation and a mapping from images to camera extrinsic parameters. To facilitate the latter, we design a custom network architecture to represent pose as a distribution, granting implicit capacity for discovering view correspondences when trained end-to-end for denoising alone. This technique allows our system to successfully build NeRFs, without pose knowledge, for challenging scenes where competing methods fail. At the conclusion of training, our learned NeRF can be extracted and used as a 3D scene model; our full system can be used to sample novel camera poses and generate novel-view images.

Via

Access Paper or Ask Questions

2DQuant: Low-bit Post-Training Quantization for Image Super-Resolution

Jun 10, 2024

Kai Liu, Haotong Qin, Yong Guo, Xin Yuan, Linghe Kong, Guihai Chen, Yulun Zhang

Figure 1 for 2DQuant: Low-bit Post-Training Quantization for Image Super-Resolution

Figure 2 for 2DQuant: Low-bit Post-Training Quantization for Image Super-Resolution

Figure 3 for 2DQuant: Low-bit Post-Training Quantization for Image Super-Resolution

Figure 4 for 2DQuant: Low-bit Post-Training Quantization for Image Super-Resolution

Abstract:Low-bit quantization has become widespread for compressing image super-resolution (SR) models for edge deployment, which allows advanced SR models to enjoy compact low-bit parameters and efficient integer/bitwise constructions for storage compression and inference acceleration, respectively. However, it is notorious that low-bit quantization degrades the accuracy of SR models compared to their full-precision (FP) counterparts. Despite several efforts to alleviate the degradation, the transformer-based SR model still suffers severe degradation due to its distinctive activation distribution. In this work, we present a dual-stage low-bit post-training quantization (PTQ) method for image super-resolution, namely 2DQuant, which achieves efficient and accurate SR under low-bit quantization. The proposed method first investigates the weight and activation and finds that the distribution is characterized by coexisting symmetry and asymmetry, long tails. Specifically, we propose Distribution-Oriented Bound Initialization (DOBI), using different searching strategies to search a coarse bound for quantizers. To obtain refined quantizer parameters, we further propose Distillation Quantization Calibration (DQC), which employs a distillation approach to make the quantized model learn from its FP counterpart. Through extensive experiments on different bits and scaling factors, the performance of DOBI can reach the state-of-the-art (SOTA) while after stage two, our method surpasses existing PTQ in both metrics and visual effects. 2DQuant gains an increase in PSNR as high as 4.52dB on Set5 (x2) compared with SOTA when quantized to 2-bit and enjoys a 3.60x compression ratio and 5.08x speedup ratio. The code and models will be available at https://github.com/Kai-Liu001/2DQuant.

* 9 pages, 6 figures. The code and models will be available at https://github.com/Kai-Liu001/2DQuant

Via

Access Paper or Ask Questions

Binarized Diffusion Model for Image Super-Resolution

Jun 09, 2024

Zheng Chen, Haotong Qin, Yong Guo, Xiongfei Su, Xin Yuan, Linghe Kong, Yulun Zhang

Figure 1 for Binarized Diffusion Model for Image Super-Resolution

Figure 2 for Binarized Diffusion Model for Image Super-Resolution

Figure 3 for Binarized Diffusion Model for Image Super-Resolution

Figure 4 for Binarized Diffusion Model for Image Super-Resolution

Abstract:Advanced diffusion models (DMs) perform impressively in image super-resolution (SR), but the high memory and computational costs hinder their deployment. Binarization, an ultra-compression algorithm, offers the potential for effectively accelerating DMs. Nonetheless, due to the model structure and the multi-step iterative attribute of DMs, existing binarization methods result in significant performance degradation. In this paper, we introduce a novel binarized diffusion model, BI-DiffSR, for image SR. First, for the model structure, we design a UNet architecture optimized for binarization. We propose the consistent-pixel-downsample (CP-Down) and consistent-pixel-upsample (CP-Up) to maintain dimension consistent and facilitate the full-precision information transfer. Meanwhile, we design the channel-shuffle-fusion (CS-Fusion) to enhance feature fusion in skip connection. Second, for the activation difference across timestep, we design the timestep-aware redistribution (TaR) and activation function (TaA). The TaR and TaA dynamically adjust the distribution of activations based on different timesteps, improving the flexibility and representation alability of the binarized module. Comprehensive experiments demonstrate that our BI-DiffSR outperforms existing binarization methods. Code is available at https://github.com/zhengchen1999/BI-DiffSR.

* Code is available at https://github.com/zhengchen1999/BI-DiffSR

Via

Access Paper or Ask Questions

Untrained Neural Nets for Snapshot Compressive Imaging: Theory and Algorithms

Jun 06, 2024

Mengyu Zhao, Xi Chen, Xin Yuan, Shirin Jalali

Figure 1 for Untrained Neural Nets for Snapshot Compressive Imaging: Theory and Algorithms

Figure 2 for Untrained Neural Nets for Snapshot Compressive Imaging: Theory and Algorithms

Figure 3 for Untrained Neural Nets for Snapshot Compressive Imaging: Theory and Algorithms

Figure 4 for Untrained Neural Nets for Snapshot Compressive Imaging: Theory and Algorithms

Abstract:Snapshot compressive imaging (SCI) recovers high-dimensional (3D) data cubes from a single 2D measurement, enabling diverse applications like video and hyperspectral imaging to go beyond standard techniques in terms of acquisition speed and efficiency. In this paper, we focus on SCI recovery algorithms that employ untrained neural networks (UNNs), such as deep image prior (DIP), to model source structure. Such UNN-based methods are appealing as they have the potential of avoiding the computationally intensive retraining required for different source models and different measurement scenarios. We first develop a theoretical framework for characterizing the performance of such UNN-based methods. The theoretical framework, on the one hand, enables us to optimize the parameters of data-modulating masks, and on the other hand, provides a fundamental connection between the number of data frames that can be recovered from a single measurement to the parameters of the untrained NN. We also employ the recently proposed bagged-deep-image-prior (bagged-DIP) idea to develop SCI Bagged Deep Video Prior (SCI-BDVP) algorithms that address the common challenges faced by standard UNN solutions. Our experimental results show that in video SCI our proposed solution achieves state-of-the-art among UNN methods, and in the case of noisy measurements, it even outperforms supervised solutions.

Via

Access Paper or Ask Questions