Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Shu-Tao Xia

Anno-incomplete Multi-dataset Detection

Aug 29, 2024

Yiran Xu, Haoxiang Zhong, Kai Wu, Jialin Li, Yong Liu, Chengjie Wang, Shu-Tao Xia, Hongen Liao

Figure 1 for Anno-incomplete Multi-dataset Detection

Figure 2 for Anno-incomplete Multi-dataset Detection

Figure 3 for Anno-incomplete Multi-dataset Detection

Figure 4 for Anno-incomplete Multi-dataset Detection

Abstract:Object detectors have shown outstanding performance on various public datasets. However, annotating a new dataset for a new task is usually unavoidable in real, since 1) a single existing dataset usually does not contain all object categories needed; 2) using multiple datasets usually suffers from annotation incompletion and heterogeneous features. We propose a novel problem as "Annotation-incomplete Multi-dataset Detection", and develop an end-to-end multi-task learning architecture which can accurately detect all the object categories with multiple partially annotated datasets. Specifically, we propose an attention feature extractor which helps to mine the relations among different datasets. Besides, a knowledge amalgamation training strategy is incorporated to accommodate heterogeneous features from different sources. Extensive experiments on different object detection datasets demonstrate the effectiveness of our methods and an improvement of 2.17%, 2.10% in mAP can be achieved on COCO and VOC respectively.

* 12 pages, 9 figures

Via

Access Paper or Ask Questions

Large Point-to-Gaussian Model for Image-to-3D Generation

Aug 20, 2024

Longfei Lu, Huachen Gao, Tao Dai, Yaohua Zha, Zhi Hou, Junta Wu, Shu-Tao Xia

Figure 1 for Large Point-to-Gaussian Model for Image-to-3D Generation

Figure 2 for Large Point-to-Gaussian Model for Image-to-3D Generation

Figure 3 for Large Point-to-Gaussian Model for Image-to-3D Generation

Figure 4 for Large Point-to-Gaussian Model for Image-to-3D Generation

Abstract:Recently, image-to-3D approaches have significantly advanced the generation quality and speed of 3D assets based on large reconstruction models, particularly 3D Gaussian reconstruction models. Existing large 3D Gaussian models directly map 2D image to 3D Gaussian parameters, while regressing 2D image to 3D Gaussian representations is challenging without 3D priors. In this paper, we propose a large Point-to-Gaussian model, that inputs the initial point cloud produced from large 3D diffusion model conditional on 2D image to generate the Gaussian parameters, for image-to-3D generation. The point cloud provides initial 3D geometry prior for Gaussian generation, thus significantly facilitating image-to-3D Generation. Moreover, we present the \textbf{A}ttention mechanism, \textbf{P}rojection mechanism, and \textbf{P}oint feature extractor, dubbed as \textbf{APP} block, for fusing the image features with point cloud features. The qualitative and quantitative experiments extensively demonstrate the effectiveness of the proposed approach on GSO and Objaverse datasets, and show the proposed method achieves state-of-the-art performance.

* 10 pages, 9 figures, ACM MM 2024

Via

Access Paper or Ask Questions

A Closer Look at GAN Priors: Exploiting Intermediate Features for Enhanced Model Inversion Attacks

Jul 18, 2024

Yixiang Qiu, Hao Fang, Hongyao Yu, Bin Chen, MeiKang Qiu, Shu-Tao Xia

Figure 1 for A Closer Look at GAN Priors: Exploiting Intermediate Features for Enhanced Model Inversion Attacks

Figure 2 for A Closer Look at GAN Priors: Exploiting Intermediate Features for Enhanced Model Inversion Attacks

Figure 3 for A Closer Look at GAN Priors: Exploiting Intermediate Features for Enhanced Model Inversion Attacks

Figure 4 for A Closer Look at GAN Priors: Exploiting Intermediate Features for Enhanced Model Inversion Attacks

Abstract:Model Inversion (MI) attacks aim to reconstruct privacy-sensitive training data from released models by utilizing output information, raising extensive concerns about the security of Deep Neural Networks (DNNs). Recent advances in generative adversarial networks (GANs) have contributed significantly to the improved performance of MI attacks due to their powerful ability to generate realistic images with high fidelity and appropriate semantics. However, previous MI attacks have solely disclosed private information in the latent space of GAN priors, limiting their semantic extraction and transferability across multiple target models and datasets. To address this challenge, we propose a novel method, Intermediate Features enhanced Generative Model Inversion (IF-GMI), which disassembles the GAN structure and exploits features between intermediate blocks. This allows us to extend the optimization space from latent code to intermediate features with enhanced expressive capabilities. To prevent GAN priors from generating unrealistic images, we apply a L1 ball constraint to the optimization process. Experiments on multiple benchmarks demonstrate that our method significantly outperforms previous approaches and achieves state-of-the-art results under various settings, especially in the out-of-distribution (OOD) scenario. Our code is available at: https://github.com/final-solution/IF-GMI

Via

Access Paper or Ask Questions

CLIP-Guided Networks for Transferable Targeted Attacks

Jul 14, 2024

Hao Fang, Jiawei Kong, Bin Chen, Tao Dai, Hao Wu, Shu-Tao Xia

Abstract:Transferable targeted adversarial attacks aim to mislead models into outputting adversary-specified predictions in black-box scenarios. Recent studies have introduced \textit{single-target} generative attacks that train a generator for each target class to generate highly transferable perturbations, resulting in substantial computational overhead when handling multiple classes. \textit{Multi-target} attacks address this by training only one class-conditional generator for multiple classes. However, the generator simply uses class labels as conditions, failing to leverage the rich semantic information of the target class. To this end, we design a \textbf{C}LIP-guided \textbf{G}enerative \textbf{N}etwork with \textbf{C}ross-attention modules (CGNC) to enhance multi-target attacks by incorporating textual knowledge of CLIP into the generator. Extensive experiments demonstrate that CGNC yields significant improvements over previous multi-target generative attacks, e.g., a 21.46\% improvement in success rate from ResNet-152 to DenseNet-121. Moreover, we propose a masked fine-tuning mechanism to further strengthen our method in attacking a single class, which surpasses existing single-target methods.

* ECCV 2024

Via

Access Paper or Ask Questions

Pre-training Point Cloud Compact Model with Partial-aware Reconstruction

Jul 12, 2024

Yaohua Zha, Yanzi Wang, Tao Dai, Shu-Tao Xia

Abstract:The pre-trained point cloud model based on Masked Point Modeling (MPM) has exhibited substantial improvements across various tasks. However, two drawbacks hinder their practical application. Firstly, the positional embedding of masked patches in the decoder results in the leakage of their central coordinates, leading to limited 3D representations. Secondly, the excessive model size of existing MPM methods results in higher demands for devices. To address these, we propose to pre-train Point cloud Compact Model with Partial-aware \textbf{R}econstruction, named Point-CPR. Specifically, in the decoder, we couple the vanilla masked tokens with their positional embeddings as randomly masked queries and introduce a partial-aware prediction module before each decoder layer to predict them from the unmasked partial. It prevents the decoder from creating a shortcut between the central coordinates of masked patches and their reconstructed coordinates, enhancing the robustness of models. We also devise a compact encoder composed of local aggregation and MLPs, reducing the parameters and computational requirements compared to existing Transformer-based encoders. Extensive experiments demonstrate that our model exhibits strong performance across various tasks, especially surpassing the leading MPM-based model PointGPT-B with only 2% of its parameters.

* arXiv admin note: text overlap with arXiv:2405.17149

Via

Access Paper or Ask Questions

Parameter-Efficient and Memory-Efficient Tuning for Vision Transformer: A Disentangled Approach

Jul 09, 2024

Taolin Zhang, Jiawang Bai, Zhihe Lu, Dongze Lian, Genping Wang, Xinchao Wang, Shu-Tao Xia

Figure 1 for Parameter-Efficient and Memory-Efficient Tuning for Vision Transformer: A Disentangled Approach

Figure 2 for Parameter-Efficient and Memory-Efficient Tuning for Vision Transformer: A Disentangled Approach

Figure 3 for Parameter-Efficient and Memory-Efficient Tuning for Vision Transformer: A Disentangled Approach

Figure 4 for Parameter-Efficient and Memory-Efficient Tuning for Vision Transformer: A Disentangled Approach

Abstract:Recent works on parameter-efficient transfer learning (PETL) show the potential to adapt a pre-trained Vision Transformer to downstream recognition tasks with only a few learnable parameters. However, since they usually insert new structures into the pre-trained model, entire intermediate features of that model are changed and thus need to be stored to be involved in back-propagation, resulting in memory-heavy training. We solve this problem from a novel disentangled perspective, i.e., dividing PETL into two aspects: task-specific learning and pre-trained knowledge utilization. Specifically, we synthesize the task-specific query with a learnable and lightweight module, which is independent of the pre-trained model. The synthesized query equipped with task-specific knowledge serves to extract the useful features for downstream tasks from the intermediate representations of the pre-trained model in a query-only manner. Built upon these features, a customized classification head is proposed to make the prediction for the input sample. lightweight architecture and avoids the use of heavy intermediate features for running gradient descent, it demonstrates limited memory usage in training. Extensive experiments manifest that our method achieves state-of-the-art performance under memory constraints, showcasing its applicability in real-world situations.

* ECCV2024

Via

Access Paper or Ask Questions

Video Watermarking: Safeguarding Your Video from (Unauthorized) Annotations by Video-based LLMs

Jul 02, 2024

Jinmin Li, Kuofeng Gao, Yang Bai, Jingyun Zhang, Shu-Tao Xia

Figure 1 for Video Watermarking: Safeguarding Your Video from (Unauthorized) Annotations by Video-based LLMs

Figure 2 for Video Watermarking: Safeguarding Your Video from (Unauthorized) Annotations by Video-based LLMs

Figure 3 for Video Watermarking: Safeguarding Your Video from (Unauthorized) Annotations by Video-based LLMs

Figure 4 for Video Watermarking: Safeguarding Your Video from (Unauthorized) Annotations by Video-based LLMs

Abstract:The advent of video-based Large Language Models (LLMs) has significantly enhanced video understanding. However, it has also raised some safety concerns regarding data protection, as videos can be more easily annotated, even without authorization. This paper introduces Video Watermarking, a novel technique to protect videos from unauthorized annotations by such video-based LLMs, especially concerning the video content and description, in response to specific queries. By imperceptibly embedding watermarks into key video frames with multi-modal flow-based losses, our method preserves the viewing experience while preventing misuse by video-based LLMs. Extensive experiments show that Video Watermarking significantly reduces the comprehensibility of videos with various video-based LLMs, demonstrating both stealth and robustness. In essence, our method provides a solution for securing video content, ensuring its integrity and confidentiality in the face of evolving video-based LLMs technologies.

* arXiv admin note: substantial text overlap with arXiv:2403.13507

Via

Access Paper or Ask Questions

DreamBench++: A Human-Aligned Benchmark for Personalized Image Generation

Jun 24, 2024

Yuang Peng, Yuxin Cui, Haomiao Tang, Zekun Qi, Runpei Dong, Jing Bai, Chunrui Han, Zheng Ge, Xiangyu Zhang, Shu-Tao Xia

Figure 1 for DreamBench++: A Human-Aligned Benchmark for Personalized Image Generation

Figure 2 for DreamBench++: A Human-Aligned Benchmark for Personalized Image Generation

Figure 3 for DreamBench++: A Human-Aligned Benchmark for Personalized Image Generation

Figure 4 for DreamBench++: A Human-Aligned Benchmark for Personalized Image Generation

Abstract:Personalized image generation holds great promise in assisting humans in everyday work and life due to its impressive function in creatively generating personalized content. However, current evaluations either are automated but misalign with humans or require human evaluations that are time-consuming and expensive. In this work, we present DreamBench++, a human-aligned benchmark automated by advanced multimodal GPT models. Specifically, we systematically design the prompts to let GPT be both human-aligned and self-aligned, empowered with task reinforcement. Further, we construct a comprehensive dataset comprising diverse images and prompts. By benchmarking 7 modern generative models, we demonstrate that DreamBench++ results in significantly more human-aligned evaluation, helping boost the community with innovative findings.

* Project page: https://dreambenchplus.github.io/

Via

Access Paper or Ask Questions

Hierarchical Features Matter: A Deep Exploration of GAN Priors for Improved Dataset Distillation

Jun 12, 2024

Xinhao Zhong, Hao Fang, Bin Chen, Xulin Gu, Tao Dai, Meikang Qiu, Shu-Tao Xia

Figure 1 for Hierarchical Features Matter: A Deep Exploration of GAN Priors for Improved Dataset Distillation

Figure 2 for Hierarchical Features Matter: A Deep Exploration of GAN Priors for Improved Dataset Distillation

Figure 3 for Hierarchical Features Matter: A Deep Exploration of GAN Priors for Improved Dataset Distillation

Figure 4 for Hierarchical Features Matter: A Deep Exploration of GAN Priors for Improved Dataset Distillation

Abstract:Dataset distillation is an emerging dataset reduction method, which condenses large-scale datasets while maintaining task accuracy. Current methods have integrated parameterization techniques to boost synthetic dataset performance by shifting the optimization space from pixel to another informative feature domain. However, they limit themselves to a fixed optimization space for distillation, neglecting the diverse guidance across different informative latent spaces. To overcome this limitation, we propose a novel parameterization method dubbed Hierarchical Generative Latent Distillation (H-GLaD), to systematically explore hierarchical layers within the generative adversarial networks (GANs). This allows us to progressively span from the initial latent space to the final pixel space. In addition, we introduce a novel class-relevant feature distance metric to alleviate the computational burden associated with synthetic dataset evaluation, bridging the gap between synthetic and original datasets. Experimental results demonstrate that the proposed H-GLaD achieves a significant improvement in both same-architecture and cross-architecture performance with equivalent time consumption.

Via

Access Paper or Ask Questions

GI-NAS: Boosting Gradient Inversion Attacks through Adaptive Neural Architecture Search

May 31, 2024

Wenbo Yu, Hao Fang, Bin Chen, Xiaohang Sui, Chuan Chen, Hao Wu, Shu-Tao Xia, Ke Xu

Abstract:Gradient Inversion Attacks invert the transmitted gradients in Federated Learning (FL) systems to reconstruct the sensitive data of local clients and have raised considerable privacy concerns. A majority of gradient inversion methods rely heavily on explicit prior knowledge (e.g., a well pre-trained generative model), which is often unavailable in realistic scenarios. To alleviate this issue, researchers have proposed to leverage the implicit prior knowledge of an over-parameterized network. However, they only utilize a fixed neural architecture for all the attack settings. This would hinder the adaptive use of implicit architectural priors and consequently limit the generalizability. In this paper, we further exploit such implicit prior knowledge by proposing Gradient Inversion via Neural Architecture Search (GI-NAS), which adaptively searches the network and captures the implicit priors behind neural architectures. Extensive experiments verify that our proposed GI-NAS can achieve superior attack performance compared to state-of-the-art gradient inversion methods, even under more practical settings with high-resolution images, large-sized batches, and advanced defense strategies.

Via

Access Paper or Ask Questions