Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Nenghai Yu

Transformer based Pluralistic Image Completion with Reduced Information Loss

Apr 15, 2024

Qiankun Liu, Yuqi Jiang, Zhentao Tan, Dongdong Chen, Ying Fu, Qi Chu, Gang Hua, Nenghai Yu

Figure 1 for Transformer based Pluralistic Image Completion with Reduced Information Loss

Figure 2 for Transformer based Pluralistic Image Completion with Reduced Information Loss

Figure 3 for Transformer based Pluralistic Image Completion with Reduced Information Loss

Figure 4 for Transformer based Pluralistic Image Completion with Reduced Information Loss

Abstract:Transformer based methods have achieved great success in image inpainting recently. However, we find that these solutions regard each pixel as a token, thus suffering from an information loss issue from two aspects: 1) They downsample the input image into much lower resolutions for efficiency consideration. 2) They quantize $256^3$ RGB values to a small number (such as 512) of quantized color values. The indices of quantized pixels are used as tokens for the inputs and prediction targets of the transformer. To mitigate these issues, we propose a new transformer based framework called "PUT". Specifically, to avoid input downsampling while maintaining computation efficiency, we design a patch-based auto-encoder P-VQVAE. The encoder converts the masked image into non-overlapped patch tokens and the decoder recovers the masked regions from the inpainted tokens while keeping the unmasked regions unchanged. To eliminate the information loss caused by input quantization, an Un-quantized Transformer is applied. It directly takes features from the P-VQVAE encoder as input without any quantization and only regards the quantized tokens as prediction targets. Furthermore, to make the inpainting process more controllable, we introduce semantic and structural conditions as extra guidance. Extensive experiments show that our method greatly outperforms existing transformer based methods on image fidelity and achieves much higher diversity and better fidelity than state-of-the-art pluralistic inpainting methods on complex large-scale datasets (e.g., ImageNet). Codes are available at https://github.com/liuqk3/PUT.

* Accepted by TPAMI (2024). arXiv admin note: text overlap with arXiv:2205.05076

Via

Access Paper or Ask Questions

Gaussian Shading: Provable Performance-Lossless Image Watermarking for Diffusion Models

Apr 07, 2024

Zijin Yang, Kai Zeng, Kejiang Chen, Han Fang, Weiming Zhang, Nenghai Yu

Abstract:Ethical concerns surrounding copyright protection and inappropriate content generation pose challenges for the practical implementation of diffusion models. One effective solution involves watermarking the generated images. However, existing methods often compromise the model performance or require additional training, which is undesirable for operators and users. To address this issue, we propose Gaussian Shading, a diffusion model watermarking technique that is both performance-lossless and training-free, while serving the dual purpose of copyright protection and tracing of offending content. Our watermark embedding is free of model parameter modifications and thus is plug-and-play. We map the watermark to latent representations following a standard Gaussian distribution, which is indistinguishable from latent representations obtained from the non-watermarked diffusion model. Therefore we can achieve watermark embedding with lossless performance, for which we also provide theoretical proof. Furthermore, since the watermark is intricately linked with image semantics, it exhibits resilience to lossy processing and erasure attempts. The watermark can be extracted by Denoising Diffusion Implicit Models (DDIM) inversion and inverse sampling. We evaluate Gaussian Shading on multiple versions of Stable Diffusion, and the results demonstrate that Gaussian Shading not only is performance-lossless but also outperforms existing methods in terms of robustness.

* 17 pages, 11 figures, accepted by CVPR 2024

Via

Access Paper or Ask Questions

Provably Secure Disambiguating Neural Linguistic Steganography

Mar 26, 2024

Yuang Qi, Kejiang Chen, Kai Zeng, Weiming Zhang, Nenghai Yu

Figure 1 for Provably Secure Disambiguating Neural Linguistic Steganography

Figure 2 for Provably Secure Disambiguating Neural Linguistic Steganography

Figure 3 for Provably Secure Disambiguating Neural Linguistic Steganography

Figure 4 for Provably Secure Disambiguating Neural Linguistic Steganography

Abstract:Recent research in provably secure neural linguistic steganography has overlooked a crucial aspect: the sender must detokenize stegotexts to avoid raising suspicion from the eavesdropper. The segmentation ambiguity problem, which arises when using language models based on subwords, leads to occasional decoding failures in all neural language steganography implementations based on these models. Current solutions to this issue involve altering the probability distribution of candidate words, rendering them incompatible with provably secure steganography. We propose a novel secure disambiguation method named SyncPool, which effectively addresses the segmentation ambiguity problem. We group all tokens with prefix relationships in the candidate pool before the steganographic embedding algorithm runs to eliminate uncertainty among ambiguous tokens. To enable the receiver to synchronize the sampling process of the sender, a shared cryptographically-secure pseudorandom number generator (CSPRNG) is deployed to select a token from the ambiguity pool. SyncPool does not change the size of the candidate pool or the distribution of tokens and thus is applicable to provably secure language steganography methods. We provide theoretical proofs and experimentally demonstrate the applicability of our solution to various languages and models, showing its potential to significantly improve the reliability and security of neural linguistic steganography systems.

Via

Access Paper or Ask Questions

MiM-ISTD: Mamba-in-Mamba for Efficient Infrared Small Target Detection

Mar 08, 2024

Tianxiang Chen, Zhentao Tan, Tao Gong, Qi Chu, Yue Wu, Bin Liu, Jieping Ye, Nenghai Yu

Figure 1 for MiM-ISTD: Mamba-in-Mamba for Efficient Infrared Small Target Detection

Figure 2 for MiM-ISTD: Mamba-in-Mamba for Efficient Infrared Small Target Detection

Figure 3 for MiM-ISTD: Mamba-in-Mamba for Efficient Infrared Small Target Detection

Figure 4 for MiM-ISTD: Mamba-in-Mamba for Efficient Infrared Small Target Detection

Abstract:Recently, infrared small target detection (ISTD) has made significant progress, thanks to the development of basic models. Specifically, the structures combining convolutional networks with transformers can successfully extract both local and global features. However, the disadvantage of the transformer is also inherited, i.e., the quadratic computational complexity to the length of the sequence. Inspired by the recent basic model with linear complexity for long-distance modeling, called Mamba, we explore the potential of this state space model for ISTD task in terms of effectiveness and efficiency in the paper. However, directly applying Mamba achieves poor performance since local features, which are critical to detecting small targets, cannot be fully exploited. Instead, we tailor a Mamba-in-Mamba (MiM-ISTD) structure for efficient ISTD. Specifically, we treat the local patches as "visual sentences" and use the Outer Mamba to explore the global information. We then decompose each visual sentence into sub-patches as "visual words" and use the Inner Mamba to further explore the local information among words in the visual sentence with negligible computational costs. By aggregating the word and sentence features, the MiM-ISTD can effectively explore both global and local information. Experiments on NUAA-SIRST and IRSTD-1k show the superior accuracy and efficiency of our method. Specifically, MiM-ISTD is $10 \times$ faster than the SOTA method and reduces GPU memory usage by 73.4$\%$ when testing on $2048 \times 2048$ image, overcoming the computation and memory constraints on high-resolution infrared images. Source code is available at https://github.com/txchen-USTC/MiM-ISTD.

* The first Mamba-based model for infrared small target detection

Via

Access Paper or Ask Questions

Towards Generalist Prompting for Large Language Models by Mental Models

Feb 28, 2024

Haoxiang Guan, Jiyan He, Shuxin Zheng, En-Hong Chen, Weiming Zhang, Nenghai Yu

Figure 1 for Towards Generalist Prompting for Large Language Models by Mental Models

Figure 2 for Towards Generalist Prompting for Large Language Models by Mental Models

Figure 3 for Towards Generalist Prompting for Large Language Models by Mental Models

Figure 4 for Towards Generalist Prompting for Large Language Models by Mental Models

Abstract:Large language models (LLMs) have demonstrated impressive performance on many tasks. However, to achieve optimal performance, specially designed prompting methods are still needed. These methods either rely on task-specific few-shot examples that require a certain level of domain knowledge, or are designed to be simple but only perform well on a few types of tasks. In this work, we attempt to introduce the concept of generalist prompting, which operates on the design principle of achieving optimal or near-optimal performance on a wide range of tasks while eliminating the need for manual selection and customization of prompts tailored to specific problems. Furthermore, we propose MeMo (Mental Models), an innovative prompting method that is simple-designed yet effectively fulfills the criteria of generalist prompting. MeMo distills the cores of various prompting methods into individual mental models and allows LLMs to autonomously select the most suitable mental models for the problem, achieving or being near to the state-of-the-art results on diverse tasks such as STEM, logical reasoning, and commonsense reasoning in zero-shot settings. We hope that the insights presented herein will stimulate further exploration of generalist prompting methods for LLMs.

Via

Access Paper or Ask Questions

Model X-ray:Detect Backdoored Models via Decision Boundary

Feb 27, 2024

Yanghao Su, Jie Zhang, Ting Xu, Tianwei Zhang, Weiming Zhang, Nenghai Yu

Figure 1 for Model X-ray:Detect Backdoored Models via Decision Boundary

Figure 2 for Model X-ray:Detect Backdoored Models via Decision Boundary

Figure 3 for Model X-ray:Detect Backdoored Models via Decision Boundary

Figure 4 for Model X-ray:Detect Backdoored Models via Decision Boundary

Abstract:Deep neural networks (DNNs) have revolutionized various industries, leading to the rise of Machine Learning as a Service (MLaaS). In this paradigm, well-trained models are typically deployed through APIs. However, DNNs are susceptible to backdoor attacks, which pose significant risks to their applications. This vulnerability necessitates a method for users to ascertain whether an API is compromised before usage. Although many backdoor detection methods have been developed, they often operate under the assumption that the defender has access to specific information such as details of the attack, soft predictions from the model API, and even the knowledge of the model parameters, limiting their practicality in MLaaS scenarios. To address it, in this paper, we begin by presenting an intriguing observation: the decision boundary of the backdoored model exhibits a greater degree of closeness than that of the clean model. Simultaneously, if only one single label is infected, a larger portion of the regions will be dominated by the attacked label. Building upon this observation, we propose Model X-ray, a novel backdoor detection approach for MLaaS through the analysis of decision boundaries. Model X-ray can not only identify whether the target API is infected by backdoor attacks but also determine the target attacked label under the all-to-one attack strategy. Importantly, it accomplishes this solely by the hard prediction of clean inputs, regardless of any assumptions about attacks and prior knowledge of the training details of the model. Extensive experiments demonstrated that Model X-ray can be effective for MLaaS across diverse backdoor attacks, datasets, and architectures.

Via

Access Paper or Ask Questions

Bootstrapping Audio-Visual Segmentation by Strengthening Audio Cues

Feb 06, 2024

Tianxiang Chen, Zhentao Tan, Tao Gong, Qi Chu, Yue Wu, Bin Liu, Le Lu, Jieping Ye, Nenghai Yu

Figure 1 for Bootstrapping Audio-Visual Segmentation by Strengthening Audio Cues

Figure 2 for Bootstrapping Audio-Visual Segmentation by Strengthening Audio Cues

Figure 3 for Bootstrapping Audio-Visual Segmentation by Strengthening Audio Cues

Figure 4 for Bootstrapping Audio-Visual Segmentation by Strengthening Audio Cues

Abstract:How to effectively interact audio with vision has garnered considerable interest within the multi-modality research field. Recently, a novel audio-visual segmentation (AVS) task has been proposed, aiming to segment the sounding objects in video frames under the guidance of audio cues. However, most existing AVS methods are hindered by a modality imbalance where the visual features tend to dominate those of the audio modality, due to a unidirectional and insufficient integration of audio cues. This imbalance skews the feature representation towards the visual aspect, impeding the learning of joint audio-visual representations and potentially causing segmentation inaccuracies. To address this issue, we propose AVSAC. Our approach features a Bidirectional Audio-Visual Decoder (BAVD) with integrated bidirectional bridges, enhancing audio cues and fostering continuous interplay between audio and visual modalities. This bidirectional interaction narrows the modality imbalance, facilitating more effective learning of integrated audio-visual representations. Additionally, we present a strategy for audio-visual frame-wise synchrony as fine-grained guidance of BAVD. This strategy enhances the share of auditory components in visual features, contributing to a more balanced audio-visual representation learning. Extensive experiments show that our method attains new benchmarks in AVS performance.

Via

Access Paper or Ask Questions

TCI-Former: Thermal Conduction-Inspired Transformer for Infrared Small Target Detection

Feb 03, 2024

Tianxiang Chen, Zhentao Tan, Qi Chu, Yue Wu, Bin Liu, Nenghai Yu

Figure 1 for TCI-Former: Thermal Conduction-Inspired Transformer for Infrared Small Target Detection

Figure 2 for TCI-Former: Thermal Conduction-Inspired Transformer for Infrared Small Target Detection

Figure 3 for TCI-Former: Thermal Conduction-Inspired Transformer for Infrared Small Target Detection

Figure 4 for TCI-Former: Thermal Conduction-Inspired Transformer for Infrared Small Target Detection

Abstract:Infrared small target detection (ISTD) is critical to national security and has been extensively applied in military areas. ISTD aims to segment small target pixels from background. Most ISTD networks focus on designing feature extraction blocks or feature fusion modules, but rarely describe the ISTD process from the feature map evolution perspective. In the ISTD process, the network attention gradually shifts towards target areas. We abstract this process as the directional movement of feature map pixels to target areas through convolution, pooling and interactions with surrounding pixels, which can be analogous to the movement of thermal particles constrained by surrounding variables and particles. In light of this analogy, we propose Thermal Conduction-Inspired Transformer (TCI-Former) based on the theoretical principles of thermal conduction. According to thermal conduction differential equation in heat dynamics, we derive the pixel movement differential equation (PMDE) in the image domain and further develop two modules: Thermal Conduction-Inspired Attention (TCIA) and Thermal Conduction Boundary Module (TCBM). TCIA incorporates finite difference method with PMDE to reach a numerical approximation so that target body features can be extracted. To further remove errors in boundary areas, TCBM is designed and supervised by boundary masks to refine target body features with fine boundary details. Experiments on IRSTD-1k and NUAA-SIRST demonstrate the superiority of our method.

Via

Access Paper or Ask Questions

Data-Free Hard-Label Robustness Stealing Attack

Dec 12, 2023

Xiaojian Yuan, Kejiang Chen, Wen Huang, Jie Zhang, Weiming Zhang, Nenghai Yu

Abstract:The popularity of Machine Learning as a Service (MLaaS) has led to increased concerns about Model Stealing Attacks (MSA), which aim to craft a clone model by querying MLaaS. Currently, most research on MSA assumes that MLaaS can provide soft labels and that the attacker has a proxy dataset with a similar distribution. However, this fails to encapsulate the more practical scenario where only hard labels are returned by MLaaS and the data distribution remains elusive. Furthermore, most existing work focuses solely on stealing the model accuracy, neglecting the model robustness, while robustness is essential in security-sensitive scenarios, e.g., face-scan payment. Notably, improving model robustness often necessitates the use of expensive techniques such as adversarial training, thereby further making stealing robustness a more lucrative prospect. In response to these identified gaps, we introduce a novel Data-Free Hard-Label Robustness Stealing (DFHL-RS) attack in this paper, which enables the stealing of both model accuracy and robustness by simply querying hard labels of the target model without the help of any natural data. Comprehensive experiments demonstrate the effectiveness of our method. The clone model achieves a clean accuracy of 77.86% and a robust accuracy of 39.51% against AutoAttack, which are only 4.71% and 8.40% lower than the target model on the CIFAR-10 dataset, significantly exceeding the baselines. Our code is available at: https://github.com/LetheSec/DFHL-RS-Attack.

* Accepted by AAAI 2024

Via

Access Paper or Ask Questions

Control Risk for Potential Misuse of Artificial Intelligence in Science

Dec 11, 2023

Jiyan He, Weitao Feng, Yaosen Min, Jingwei Yi, Kunsheng Tang, Shuai Li, Jie Zhang, Kejiang Chen, Wenbo Zhou, Xing Xie(+3 more)

Figure 1 for Control Risk for Potential Misuse of Artificial Intelligence in Science

Figure 2 for Control Risk for Potential Misuse of Artificial Intelligence in Science

Figure 3 for Control Risk for Potential Misuse of Artificial Intelligence in Science

Figure 4 for Control Risk for Potential Misuse of Artificial Intelligence in Science

Abstract:The expanding application of Artificial Intelligence (AI) in scientific fields presents unprecedented opportunities for discovery and innovation. However, this growth is not without risks. AI models in science, if misused, can amplify risks like creation of harmful substances, or circumvention of established regulations. In this study, we aim to raise awareness of the dangers of AI misuse in science, and call for responsible AI development and use in this domain. We first itemize the risks posed by AI in scientific contexts, then demonstrate the risks by highlighting real-world examples of misuse in chemical science. These instances underscore the need for effective risk management strategies. In response, we propose a system called SciGuard to control misuse risks for AI models in science. We also propose a red-teaming benchmark SciMT-Safety to assess the safety of different systems. Our proposed SciGuard shows the least harmful impact in the assessment without compromising performance in benign tests. Finally, we highlight the need for a multidisciplinary and collaborative effort to ensure the safe and ethical use of AI models in science. We hope that our study can spark productive discussions on using AI ethically in science among researchers, practitioners, policymakers, and the public, to maximize benefits and minimize the risks of misuse.

Via

Access Paper or Ask Questions