Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Xi Li

Mark

SphereDiffusion: Spherical Geometry-Aware Distortion Resilient Diffusion Model

Mar 15, 2024

Tao Wu, Xuewei Li, Zhongang Qi, Di Hu, Xintao Wang, Ying Shan, Xi Li

Abstract:Controllable spherical panoramic image generation holds substantial applicative potential across a variety of domains.However, it remains a challenging task due to the inherent spherical distortion and geometry characteristics, resulting in low-quality content generation.In this paper, we introduce a novel framework of SphereDiffusion to address these unique challenges, for better generating high-quality and precisely controllable spherical panoramic images.For the spherical distortion characteristic, we embed the semantics of the distorted object with text encoding, then explicitly construct the relationship with text-object correspondence to better use the pre-trained knowledge of the planar images.Meanwhile, we employ a deformable technique to mitigate the semantic deviation in latent space caused by spherical distortion.For the spherical geometry characteristic, in virtue of spherical rotation invariance, we improve the data diversity and optimization objectives in the training process, enabling the model to better learn the spherical geometry characteristic.Furthermore, we enhance the denoising process of the diffusion model, enabling it to effectively use the learned geometric characteristic to ensure the boundary continuity of the generated images.With these specific techniques, experiments on Structured3D dataset show that SphereDiffusion significantly improves the quality of controllable spherical image generation and relatively reduces around 35% FID on average.

* Accepted by AAAI2024

Via

Access Paper or Ask Questions

Personalized Behavior-Aware Transformer for Multi-Behavior Sequential Recommendation

Feb 22, 2024

Jiajie Su, Chaochao Chen, Zibin Lin, Xi Li, Weiming Liu, Xiaolin Zheng

Figure 1 for Personalized Behavior-Aware Transformer for Multi-Behavior Sequential Recommendation

Figure 2 for Personalized Behavior-Aware Transformer for Multi-Behavior Sequential Recommendation

Figure 3 for Personalized Behavior-Aware Transformer for Multi-Behavior Sequential Recommendation

Figure 4 for Personalized Behavior-Aware Transformer for Multi-Behavior Sequential Recommendation

Abstract:Sequential Recommendation (SR) captures users' dynamic preferences by modeling how users transit among items. However, SR models that utilize only single type of behavior interaction data encounter performance degradation when the sequences are short. To tackle this problem, we focus on Multi-Behavior Sequential Recommendation (MBSR) in this paper, which aims to leverage time-evolving heterogeneous behavioral dependencies for better exploring users' potential intents on the target behavior. Solving MBSR is challenging. On the one hand, users exhibit diverse multi-behavior patterns due to personal characteristics. On the other hand, there exists comprehensive co-influence between behavior correlations and item collaborations, the intensity of which is deeply affected by temporal factors. To tackle these challenges, we propose a Personalized Behavior-Aware Transformer framework (PBAT) for MBSR problem, which models personalized patterns and multifaceted sequential collaborations in a novel way to boost recommendation performance. First, PBAT develops a personalized behavior pattern generator in the representation layer, which extracts dynamic and discriminative behavior patterns for sequential learning. Second, PBAT reforms the self-attention layer with a behavior-aware collaboration extractor, which introduces a fused behavior-aware attention mechanism for incorporating both behavioral and temporal impacts into collaborative transitions. We conduct experiments on three benchmark datasets and the results demonstrate the effectiveness and interpretability of our framework. Our implementation code is released at https://github.com/TiliaceaeSU/PBAT.

* Proceedings of the 31st ACM International Conference on Multimedia. 2023: 6321-6331

Via

Access Paper or Ask Questions

Universal Post-Training Reverse-Engineering Defense Against Backdoors in Deep Neural Networks

Feb 03, 2024

Xi Li, Hang Wang, David J. Miller, George Kesidis

Figure 1 for Universal Post-Training Reverse-Engineering Defense Against Backdoors in Deep Neural Networks

Figure 2 for Universal Post-Training Reverse-Engineering Defense Against Backdoors in Deep Neural Networks

Figure 3 for Universal Post-Training Reverse-Engineering Defense Against Backdoors in Deep Neural Networks

Figure 4 for Universal Post-Training Reverse-Engineering Defense Against Backdoors in Deep Neural Networks

Abstract:A variety of defenses have been proposed against backdoors attacks on deep neural network (DNN) classifiers. Universal methods seek to reliably detect and/or mitigate backdoors irrespective of the incorporation mechanism used by the attacker, while reverse-engineering methods often explicitly assume one. In this paper, we describe a new detector that: relies on internal feature map of the defended DNN to detect and reverse-engineer the backdoor and identify its target class; can operate post-training (without access to the training dataset); is highly effective for various incorporation mechanisms (i.e., is universal); and which has low computational overhead and so is scalable. Our detection approach is evaluated for different attacks on a benchmark CIFAR-10 image classifier.

Via

Access Paper or Ask Questions

Position Paper: Assessing Robustness, Privacy, and Fairness in Federated Learning Integrated with Foundation Models

Feb 02, 2024

Xi Li, Jiaqi Wang

Abstract:Federated Learning (FL), while a breakthrough in decentralized machine learning, contends with significant challenges such as limited data availability and the variability of computational resources, which can stifle the performance and scalability of the models. The integration of Foundation Models (FMs) into FL presents a compelling solution to these issues, with the potential to enhance data richness and reduce computational demands through pre-training and data augmentation. However, this incorporation introduces novel issues in terms of robustness, privacy, and fairness, which have not been sufficiently addressed in the existing research. We make a preliminary investigation into this field by systematically evaluating the implications of FM-FL integration across these dimensions. We analyze the trade-offs involved, uncover the threats and issues introduced by this integration, and propose a set of criteria and strategies for navigating these challenges. Furthermore, we identify potential research directions for advancing this field, laying a foundation for future development in creating reliable, secure, and equitable FL systems.

* Under review

Via

Access Paper or Ask Questions

Vulnerabilities of Foundation Model Integrated Federated Learning Under Adversarial Threats

Jan 18, 2024

Chen Wu, Xi Li, Jiaqi Wang

Abstract:Federated Learning (FL) addresses critical issues in machine learning related to data privacy and security, yet suffering from data insufficiency and imbalance under certain circumstances. The emergence of foundation models (FMs) offers potential solutions to the limitations of existing FL frameworks, e.g., by generating synthetic data for model initialization. However, due to the inherent safety concerns of FMs, integrating FMs into FL could introduce new risks, which remains largely unexplored. To address this gap, we conduct the first investigation on the vulnerability of FM integrated FL (FM-FL) under adversarial threats. Based on a unified framework of FM-FL, we introduce a novel attack strategy that exploits safety issues of FM to compromise FL client models. Through extensive experiments with well-known models and benchmark datasets in both image and text domains, we reveal the high susceptibility of the FM-FL to this new threat under various FL configurations. Furthermore, we find that existing FL defense strategies offer limited protection against this novel attack approach. This research highlights the critical need for enhanced security measures in FL in the era of FMs.

* Chen Wu and Xi Li are equal contribution. The corresponding author is Jiaqi Wang

Via

Access Paper or Ask Questions

TextFusion: Unveiling the Power of Textual Semantics for Controllable Image Fusion

Dec 21, 2023

Chunyang Cheng, Tianyang Xu, Xiao-Jun Wu, Hui Li, Xi Li, Zhangyong Tang, Josef Kittler

Abstract:Advanced image fusion methods are devoted to generating the fusion results by aggregating the complementary information conveyed by the source images. However, the difference in the source-specific manifestation of the imaged scene content makes it difficult to design a robust and controllable fusion process. We argue that this issue can be alleviated with the help of higher-level semantics, conveyed by the text modality, which should enable us to generate fused images for different purposes, such as visualisation and downstream tasks, in a controllable way. This is achieved by exploiting a vision-and-language model to build a coarse-to-fine association mechanism between the text and image signals. With the guidance of the association maps, an affine fusion unit is embedded in the transformer network to fuse the text and vision modalities at the feature level. As another ingredient of this work, we propose the use of textual attention to adapt image quality assessment to the fusion task. To facilitate the implementation of the proposed text-guided fusion paradigm, and its adoption by the wider research community, we release a text-annotated image fusion dataset IVT. Extensive experiments demonstrate that our approach (TextFusion) consistently outperforms traditional appearance-based fusion methods. Our code and dataset will be publicly available on the project homepage.

* 15 pages, 15 figures

Via

Access Paper or Ask Questions

Prompting Segmentation with Sound is Generalizable Audio-Visual Source Localizer

Sep 18, 2023

Yaoting Wang, Weisong Liu, Guangyao Li, Jian Ding, Di Hu, Xi Li

Abstract:Never having seen an object and heard its sound simultaneously, can the model still accurately localize its visual position from the input audio? In this work, we concentrate on the Audio-Visual Localization and Segmentation tasks but under the demanding zero-shot and few-shot scenarios. To achieve this goal, different from existing approaches that mostly employ the encoder-fusion-decoder paradigm to decode localization information from the fused audio-visual feature, we introduce the encoder-prompt-decoder paradigm, aiming to better fit the data scarcity and varying data distribution dilemmas with the help of abundant knowledge from pre-trained models. Specifically, we first propose to construct Semantic-aware Audio Prompt (SAP) to help the visual foundation model focus on sounding objects, meanwhile, the semantic gap between the visual and audio modalities is also encouraged to shrink. Then, we develop a Correlation Adapter (ColA) to keep minimal training efforts as well as maintain adequate knowledge of the visual foundation model. By equipping with these means, extensive experiments demonstrate that this new paradigm outperforms other fusion-based methods in both the unseen class and cross-dataset settings. We hope that our work can further promote the generalization study of Audio-Visual Localization and Segmentation in practical application scenarios.

* 11 pages, 7 figures, modified the additional materials

Via

Access Paper or Ask Questions

Temporal-Distributed Backdoor Attack Against Video Based Action Recognition

Sep 01, 2023

Xi Li, Songhe Wang, Ruiquan Huang, Mahanth Gowda, George Kesidis

Figure 1 for Temporal-Distributed Backdoor Attack Against Video Based Action Recognition

Figure 2 for Temporal-Distributed Backdoor Attack Against Video Based Action Recognition

Figure 3 for Temporal-Distributed Backdoor Attack Against Video Based Action Recognition

Figure 4 for Temporal-Distributed Backdoor Attack Against Video Based Action Recognition

Abstract:Deep neural networks (DNNs) have achieved tremendous success in various applications including video action recognition, yet remain vulnerable to backdoor attacks (Trojans). The backdoor-compromised model will mis-classify to the target class chosen by the attacker when a test instance (from a non-target class) is embedded with a specific trigger, while maintaining high accuracy on attack-free instances. Although there are extensive studies on backdoor attacks against image data, the susceptibility of video-based systems under backdoor attacks remains largely unexplored. Current studies are direct extensions of approaches proposed for image data, e.g., the triggers are independently embedded within the frames, which tend to be detectable by existing defenses. In this paper, we introduce a simple yet effective backdoor attack against video data. Our proposed attack, adding perturbations in a transformed domain, plants an imperceptible, temporally distributed trigger across the video frames, and is shown to be resilient to existing defensive strategies. The effectiveness of the proposed attack is demonstrated by extensive experiments with various well-known models on two video recognition benchmarks, UCF101 and HMDB51, and a sign language recognition benchmark, Greek Sign Language (GSL) dataset. We delve into the impact of several influential factors on our proposed attack and identify an intriguing effect termed "collateral damage" through extensive studies.

Via

Access Paper or Ask Questions

Bridging Cross-task Protocol Inconsistency for Distillation in Dense Object Detection

Aug 28, 2023

Longrong Yang, Xianpan Zhou, Xuewei Li, Liang Qiao, Zheyang Li, Ziwei Yang, Gaoang Wang, Xi Li

Figure 1 for Bridging Cross-task Protocol Inconsistency for Distillation in Dense Object Detection

Figure 2 for Bridging Cross-task Protocol Inconsistency for Distillation in Dense Object Detection

Figure 3 for Bridging Cross-task Protocol Inconsistency for Distillation in Dense Object Detection

Figure 4 for Bridging Cross-task Protocol Inconsistency for Distillation in Dense Object Detection

Abstract:Knowledge distillation (KD) has shown potential for learning compact models in dense object detection. However, the commonly used softmax-based distillation ignores the absolute classification scores for individual categories. Thus, the optimum of the distillation loss does not necessarily lead to the optimal student classification scores for dense object detectors. This cross-task protocol inconsistency is critical, especially for dense object detectors, since the foreground categories are extremely imbalanced. To address the issue of protocol differences between distillation and classification, we propose a novel distillation method with cross-task consistent protocols, tailored for the dense object detection. For classification distillation, we address the cross-task protocol inconsistency problem by formulating the classification logit maps in both teacher and student models as multiple binary-classification maps and applying a binary-classification distillation loss to each map. For localization distillation, we design an IoU-based Localization Distillation Loss that is free from specific network structures and can be compared with existing localization distillation losses. Our proposed method is simple but effective, and experimental results demonstrate its superiority over existing methods. Code is available at https://github.com/TinyTigerPan/BCKD.

* Accepted by ICCV2023

Via

Access Paper or Ask Questions

Backdoor Mitigation by Correcting the Distribution of Neural Activations

Aug 18, 2023

Xi Li, Zhen Xiang, David J. Miller, George Kesidis

Figure 1 for Backdoor Mitigation by Correcting the Distribution of Neural Activations

Figure 2 for Backdoor Mitigation by Correcting the Distribution of Neural Activations

Figure 3 for Backdoor Mitigation by Correcting the Distribution of Neural Activations

Figure 4 for Backdoor Mitigation by Correcting the Distribution of Neural Activations

Abstract:Backdoor (Trojan) attacks are an important type of adversarial exploit against deep neural networks (DNNs), wherein a test instance is (mis)classified to the attacker's target class whenever the attacker's backdoor trigger is present. In this paper, we reveal and analyze an important property of backdoor attacks: a successful attack causes an alteration in the distribution of internal layer activations for backdoor-trigger instances, compared to that for clean instances. Even more importantly, we find that instances with the backdoor trigger will be correctly classified to their original source classes if this distribution alteration is corrected. Based on our observations, we propose an efficient and effective method that achieves post-training backdoor mitigation by correcting the distribution alteration using reverse-engineered triggers. Notably, our method does not change any trainable parameters of the DNN, but achieves generally better mitigation performance than existing methods that do require intensive DNN parameter tuning. It also efficiently detects test instances with the trigger, which may help to catch adversarial entities in the act of exploiting the backdoor.

Via

Access Paper or Ask Questions