Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Yue Lu

Fragile Model Watermark for integrity protection: leveraging boundary volatility and sensitive sample-pairing

Apr 11, 2024

ZhenZhe Gao, Zhenjun Tang, Zhaoxia Yin, Baoyuan Wu, Yue Lu

Figure 1 for Fragile Model Watermark for integrity protection: leveraging boundary volatility and sensitive sample-pairing

Figure 2 for Fragile Model Watermark for integrity protection: leveraging boundary volatility and sensitive sample-pairing

Figure 3 for Fragile Model Watermark for integrity protection: leveraging boundary volatility and sensitive sample-pairing

Figure 4 for Fragile Model Watermark for integrity protection: leveraging boundary volatility and sensitive sample-pairing

Abstract:Neural networks have increasingly influenced people's lives. Ensuring the faithful deployment of neural networks as designed by their model owners is crucial, as they may be susceptible to various malicious or unintentional modifications, such as backdooring and poisoning attacks. Fragile model watermarks aim to prevent unexpected tampering that could lead DNN models to make incorrect decisions. They ensure the detection of any tampering with the model as sensitively as possible.However, prior watermarking methods suffered from inefficient sample generation and insufficient sensitivity, limiting their practical applicability. Our approach employs a sample-pairing technique, placing the model boundaries between pairs of samples, while simultaneously maximizing logits. This ensures that the model's decision results of sensitive samples change as much as possible and the Top-1 labels easily alter regardless of the direction it moves.

* The article has been accepted by IEEE International Conference on Multimedia and Expo 2024

Via

Access Paper or Ask Questions

Brush Your Text: Synthesize Any Scene Text on Images via Diffusion Model

Dec 19, 2023

Lingjun Zhang, Xinyuan Chen, Yaohui Wang, Yue Lu, Yu Qiao

Figure 1 for Brush Your Text: Synthesize Any Scene Text on Images via Diffusion Model

Figure 2 for Brush Your Text: Synthesize Any Scene Text on Images via Diffusion Model

Figure 3 for Brush Your Text: Synthesize Any Scene Text on Images via Diffusion Model

Figure 4 for Brush Your Text: Synthesize Any Scene Text on Images via Diffusion Model

Abstract:Recently, diffusion-based image generation methods are credited for their remarkable text-to-image generation capabilities, while still facing challenges in accurately generating multilingual scene text images. To tackle this problem, we propose Diff-Text, which is a training-free scene text generation framework for any language. Our model outputs a photo-realistic image given a text of any language along with a textual description of a scene. The model leverages rendered sketch images as priors, thus arousing the potential multilingual-generation ability of the pre-trained Stable Diffusion. Based on the observation from the influence of the cross-attention map on object placement in generated images, we propose a localized attention constraint into the cross-attention layer to address the unreasonable positioning problem of scene text. Additionally, we introduce contrastive image-level prompts to further refine the position of the textual region and achieve more accurate scene text generation. Experiments demonstrate that our method outperforms the existing method in both the accuracy of text recognition and the naturalness of foreground-background blending.

* Accepted to AAAI 2024. Code: https://github.com/ecnuljzhang/brush-your-text

Via

Access Paper or Ask Questions

Efficient Object Detection in Optical Remote Sensing Imagery via Attention-based Feature Distillation

Oct 28, 2023

Pourya Shamsolmoali, Jocelyn Chanussot, Huiyu Zhou, Yue Lu

Figure 1 for Efficient Object Detection in Optical Remote Sensing Imagery via Attention-based Feature Distillation

Figure 2 for Efficient Object Detection in Optical Remote Sensing Imagery via Attention-based Feature Distillation

Figure 3 for Efficient Object Detection in Optical Remote Sensing Imagery via Attention-based Feature Distillation

Figure 4 for Efficient Object Detection in Optical Remote Sensing Imagery via Attention-based Feature Distillation

Abstract:Efficient object detection methods have recently received great attention in remote sensing. Although deep convolutional networks often have excellent detection accuracy, their deployment on resource-limited edge devices is difficult. Knowledge distillation (KD) is a strategy for addressing this issue since it makes models lightweight while maintaining accuracy. However, existing KD methods for object detection have encountered two constraints. First, they discard potentially important background information and only distill nearby foreground regions. Second, they only rely on the global context, which limits the student detector's ability to acquire local information from the teacher detector. To address the aforementioned challenges, we propose Attention-based Feature Distillation (AFD), a new KD approach that distills both local and global information from the teacher detector. To enhance local distillation, we introduce a multi-instance attention mechanism that effectively distinguishes between background and foreground elements. This approach prompts the student detector to focus on the pertinent channels and pixels, as identified by the teacher detector. Local distillation lacks global information, thus attention global distillation is proposed to reconstruct the relationship between various pixels and pass it from teacher to student detector. The performance of AFD is evaluated on two public aerial image benchmarks, and the evaluation results demonstrate that AFD in object detection can attain the performance of other state-of-the-art models while being efficient.

Via

Access Paper or Ask Questions

DSAM-GN:Graph Network based on Dynamic Similarity Adjacency Matrices for Vehicle Re-identification

Oct 25, 2023

Yuejun Jiao, Song Qiu, Mingsong Chen, Dingding Han, Qingli Li, Yue Lu

Figure 1 for DSAM-GN:Graph Network based on Dynamic Similarity Adjacency Matrices for Vehicle Re-identification

Figure 2 for DSAM-GN:Graph Network based on Dynamic Similarity Adjacency Matrices for Vehicle Re-identification

Figure 3 for DSAM-GN:Graph Network based on Dynamic Similarity Adjacency Matrices for Vehicle Re-identification

Figure 4 for DSAM-GN:Graph Network based on Dynamic Similarity Adjacency Matrices for Vehicle Re-identification

Abstract:In recent years, vehicle re-identification (Re-ID) has gained increasing importance in various applications such as assisted driving systems, traffic flow management, and vehicle tracking, due to the growth of intelligent transportation systems. However, the presence of extraneous background information and occlusions can interfere with the learning of discriminative features, leading to significant variations in the same vehicle image across different scenarios. This paper proposes a method, named graph network based on dynamic similarity adjacency matrices (DSAM-GN), which incorporates a novel approach for constructing adjacency matrices to capture spatial relationships of local features and reduce background noise. Specifically, the proposed method divides the extracted vehicle features into different patches as nodes within the graph network. A spatial attention-based similarity adjacency matrix generation (SASAMG) module is employed to compute similarity matrices of nodes, and a dynamic erasure operation is applied to disconnect nodes with low similarity, resulting in similarity adjacency matrices. Finally, the nodes and similarity adjacency matrices are fed into graph networks to extract more discriminative features for vehicle Re-ID. Experimental results on public datasets VeRi-776 and VehicleID demonstrate the effectiveness of the proposed method compared with recent works.

* This paper has been accepted by the 20th Pacific Rim International Conference on Artificial Intelligence in 2023

Via

Access Paper or Ask Questions

Distance Weighted Trans Network for Image Completion

Oct 25, 2023

Pourya Shamsolmoali, Masoumeh Zareapoor, Huiyu Zhou, Xuelong Li, Yue Lu

Figure 1 for Distance Weighted Trans Network for Image Completion

Figure 2 for Distance Weighted Trans Network for Image Completion

Figure 3 for Distance Weighted Trans Network for Image Completion

Figure 4 for Distance Weighted Trans Network for Image Completion

Abstract:The challenge of image generation has been effectively modeled as a problem of structure priors or transformation. However, existing models have unsatisfactory performance in understanding the global input image structures because of particular inherent features (for example, local inductive prior). Recent studies have shown that self-attention is an efficient modeling technique for image completion problems. In this paper, we propose a new architecture that relies on Distance-based Weighted Transformer (DWT) to better understand the relationships between an image's components. In our model, we leverage the strengths of both Convolutional Neural Networks (CNNs) and DWT blocks to enhance the image completion process. Specifically, CNNs are used to augment the local texture information of coarse priors and DWT blocks are used to recover certain coarse textures and coherent visual structures. Unlike current approaches that generally use CNNs to create feature maps, we use the DWT to encode global dependencies and compute distance-based weighted feature maps, which substantially minimizes the problem of visual ambiguities. Meanwhile, to better produce repeated textures, we introduce Residual Fast Fourier Convolution (Res-FFC) blocks to combine the encoder's skip features with the coarse features provided by our generator. Furthermore, a simple yet effective technique is proposed to normalize the non-zero values of convolutions, and fine-tune the network layers for regularization of the gradient norms to provide an efficient training stabiliser. Extensive quantitative and qualitative experiments on three challenging datasets demonstrate the superiority of our proposed model compared to existing approaches.

Via

Access Paper or Ask Questions

Adaptive White-Box Watermarking with Self-Mutual Check Parameters in Deep Neural Networks

Aug 22, 2023

Zhenzhe Gao, Zhaoxia Yin, Hongjian Zhan, Heng Yin, Yue Lu

Figure 1 for Adaptive White-Box Watermarking with Self-Mutual Check Parameters in Deep Neural Networks

Figure 2 for Adaptive White-Box Watermarking with Self-Mutual Check Parameters in Deep Neural Networks

Figure 3 for Adaptive White-Box Watermarking with Self-Mutual Check Parameters in Deep Neural Networks

Figure 4 for Adaptive White-Box Watermarking with Self-Mutual Check Parameters in Deep Neural Networks

Abstract:Artificial Intelligence (AI) has found wide application, but also poses risks due to unintentional or malicious tampering during deployment. Regular checks are therefore necessary to detect and prevent such risks. Fragile watermarking is a technique used to identify tampering in AI models. However, previous methods have faced challenges including risks of omission, additional information transmission, and inability to locate tampering precisely. In this paper, we propose a method for detecting tampered parameters and bits, which can be used to detect, locate, and restore parameters that have been tampered with. We also propose an adaptive embedding method that maximizes information capacity while maintaining model accuracy. Our approach was tested on multiple neural networks subjected to attacks that modified weight parameters, and our results demonstrate that our method achieved great recovery performance when the modification rate was below 20%. Furthermore, for models where watermarking significantly affected accuracy, we utilized an adaptive bit technique to recover more than 15% of the accuracy loss of the model.

Via

Access Paper or Ask Questions

Weakly Supervised Scene Text Generation for Low-resource Languages

Jun 27, 2023

Yangchen Xie, Xinyuan Chen, Hongjian Zhan, Palaiahankote Shivakum, Bing Yin, Cong Liu, Yue Lu

Figure 1 for Weakly Supervised Scene Text Generation for Low-resource Languages

Figure 2 for Weakly Supervised Scene Text Generation for Low-resource Languages

Figure 3 for Weakly Supervised Scene Text Generation for Low-resource Languages

Figure 4 for Weakly Supervised Scene Text Generation for Low-resource Languages

Abstract:A large number of annotated training images is crucial for training successful scene text recognition models. However, collecting sufficient datasets can be a labor-intensive and costly process, particularly for low-resource languages. To address this challenge, auto-generating text data has shown promise in alleviating the problem. Unfortunately, existing scene text generation methods typically rely on a large amount of paired data, which is difficult to obtain for low-resource languages. In this paper, we propose a novel weakly supervised scene text generation method that leverages a few recognition-level labels as weak supervision. The proposed method is able to generate a large amount of scene text images with diverse backgrounds and font styles through cross-language generation. Our method disentangles the content and style features of scene text images, with the former representing textual information and the latter representing characteristics such as font, alignment, and background. To preserve the complete content structure of generated images, we introduce an integrated attention module. Furthermore, to bridge the style gap in the style of different languages, we incorporate a pre-trained font classifier. We evaluate our method using state-of-the-art scene text recognition models. Experiments demonstrate that our generated scene text significantly improves the scene text recognition accuracy and help achieve higher accuracy when complemented with other generative methods.

Via

Access Paper or Ask Questions

A New Comprehensive Benchmark for Semi-supervised Video Anomaly Detection and Anticipation

May 23, 2023

Congqi Cao, Yue Lu, Peng Wang, Yanning Zhang

Figure 1 for A New Comprehensive Benchmark for Semi-supervised Video Anomaly Detection and Anticipation

Figure 2 for A New Comprehensive Benchmark for Semi-supervised Video Anomaly Detection and Anticipation

Figure 3 for A New Comprehensive Benchmark for Semi-supervised Video Anomaly Detection and Anticipation

Figure 4 for A New Comprehensive Benchmark for Semi-supervised Video Anomaly Detection and Anticipation

Abstract:Semi-supervised video anomaly detection (VAD) is a critical task in the intelligent surveillance system. However, an essential type of anomaly in VAD named scene-dependent anomaly has not received the attention of researchers. Moreover, there is no research investigating anomaly anticipation, a more significant task for preventing the occurrence of anomalous events. To this end, we propose a new comprehensive dataset, NWPU Campus, containing 43 scenes, 28 classes of abnormal events, and 16 hours of videos. At present, it is the largest semi-supervised VAD dataset with the largest number of scenes and classes of anomalies, the longest duration, and the only one considering the scene-dependent anomaly. Meanwhile, it is also the first dataset proposed for video anomaly anticipation. We further propose a novel model capable of detecting and anticipating anomalous events simultaneously. Compared with 7 outstanding VAD algorithms in recent years, our method can cope with scene-dependent anomaly detection and anomaly anticipation both well, achieving state-of-the-art performance on ShanghaiTech, CUHK Avenue, IITB Corridor and the newly proposed NWPU Campus datasets consistently. Our dataset and code is available at: https://campusvad.github.io.

* CVPR 2023

Via

Access Paper or Ask Questions

Scene Text Recognition with Image-Text Matching-guided Dictionary

May 08, 2023

Jiajun Wei, Hongjian Zhan, Xiao Tu, Yue Lu, Umapada Pal

Abstract:Employing a dictionary can efficiently rectify the deviation between the visual prediction and the ground truth in scene text recognition methods. However, the independence of the dictionary on the visual features may lead to incorrect rectification of accurate visual predictions. In this paper, we propose a new dictionary language model leveraging the Scene Image-Text Matching(SITM) network, which avoids the drawbacks of the explicit dictionary language model: 1) the independence of the visual features; 2) noisy choice in candidates etc. The SITM network accomplishes this by using Image-Text Contrastive (ITC) Learning to match an image with its corresponding text among candidates in the inference stage. ITC is widely used in vision-language learning to pull the positive image-text pair closer in feature space. Inspired by ITC, the SITM network combines the visual features and the text features of all candidates to identify the candidate with the minimum distance in the feature space. Our lexicon method achieves better results(93.8\% accuracy) than the ordinary method results(92.1\% accuracy) on six mainstream benchmarks. Additionally, we integrate our method with ABINet and establish new state-of-the-art results on several benchmarks.

* Accepted at ICDAR2023

Via

Access Paper or Ask Questions

OTS: A One-shot Learning Approach for Text Spotting in Historical Manuscripts

Apr 18, 2023

Wenbo Hu, Hongjian Zhan, Cong Liu, Bing Yin, Yue Lu

Abstract:Historical manuscript processing poses challenges like limited annotated training data and novel class emergence. To address this, we propose a novel One-shot learning-based Text Spotting (OTS) approach that accurately and reliably spots novel characters with just one annotated support sample. Drawing inspiration from cognitive research, we introduce a spatial alignment module that finds, focuses on, and learns the most discriminative spatial regions in the query image based on one support image. Especially, since the low-resource spotting task often faces the problem of example imbalance, we propose a novel loss function called torus loss which can make the embedding space of distance metric more discriminative. Our approach is highly efficient and requires only a few training samples while exhibiting the remarkable ability to handle novel characters, and symbols. To enhance dataset diversity, a new manuscript dataset that contains the ancient Dongba hieroglyphics (DBH) is created. We conduct experiments on publicly available VML-HD, TKH, NC datasets, and the new proposed DBH dataset. The experimental results demonstrate that OTS outperforms the state-of-the-art methods in one-shot text spotting. Overall, our proposed method offers promising applications in the field of text spotting in historical manuscripts.

Via

Access Paper or Ask Questions