Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Xiao Zhang

DivTrackee versus DynTracker: Promoting Diversity in Anti-Facial Recognition against Dynamic FR Strategy

Jan 11, 2025

Wenshu Fan, Minxing Zhang, Hongwei Li, Wenbo Jiang, Hanxiao Chen, Xiangyu Yue, Michael Backes, Xiao Zhang

Figure 1 for DivTrackee versus DynTracker: Promoting Diversity in Anti-Facial Recognition against Dynamic FR Strategy

Figure 2 for DivTrackee versus DynTracker: Promoting Diversity in Anti-Facial Recognition against Dynamic FR Strategy

Figure 3 for DivTrackee versus DynTracker: Promoting Diversity in Anti-Facial Recognition against Dynamic FR Strategy

Figure 4 for DivTrackee versus DynTracker: Promoting Diversity in Anti-Facial Recognition against Dynamic FR Strategy

Abstract:The widespread adoption of facial recognition (FR) models raises serious concerns about their potential misuse, motivating the development of anti-facial recognition (AFR) to protect user facial privacy. In this paper, we argue that the static FR strategy, predominantly adopted in prior literature for evaluating AFR efficacy, cannot faithfully characterize the actual capabilities of determined trackers who aim to track a specific target identity. In particular, we introduce \emph{\ourAttack}, a dynamic FR strategy where the model's gallery database is iteratively updated with newly recognized target identity images. Surprisingly, such a simple approach renders all the existing AFR protections ineffective. To mitigate the privacy threats posed by DynTracker, we advocate for explicitly promoting diversity in the AFR-protected images. We hypothesize that the lack of diversity is the primary cause of the failure of existing AFR methods. Specifically, we develop \emph{DivTrackee}, a novel method for crafting diverse AFR protections that builds upon a text-guided image generation framework and diversity-promoting adversarial losses. Through comprehensive experiments on various facial image benchmarks and feature extractors, we demonstrate DynTracker's strength in breaking existing AFR methods and the superiority of DivTrackee in preventing user facial images from being identified by dynamic FR strategies. We believe our work can act as an important initial step towards developing more effective AFR methods for protecting user facial privacy against determined trackers.

Via

Access Paper or Ask Questions

HELPNet: Hierarchical Perturbations Consistency and Entropy-guided Ensemble for Scribble Supervised Medical Image Segmentation

Dec 25, 2024

Xiao Zhang, Shaoxuan Wu, Peilin Zhang, Zhuo Jin, Xiaosong Xiong, Qirong Bu, Jingkun Chen, Jun Feng

Figure 1 for HELPNet: Hierarchical Perturbations Consistency and Entropy-guided Ensemble for Scribble Supervised Medical Image Segmentation

Figure 2 for HELPNet: Hierarchical Perturbations Consistency and Entropy-guided Ensemble for Scribble Supervised Medical Image Segmentation

Figure 3 for HELPNet: Hierarchical Perturbations Consistency and Entropy-guided Ensemble for Scribble Supervised Medical Image Segmentation

Figure 4 for HELPNet: Hierarchical Perturbations Consistency and Entropy-guided Ensemble for Scribble Supervised Medical Image Segmentation

Abstract:Creating fully annotated labels for medical image segmentation is prohibitively time-intensive and costly, emphasizing the necessity for innovative approaches that minimize reliance on detailed annotations. Scribble annotations offer a more cost-effective alternative, significantly reducing the expenses associated with full annotations. However, scribble annotations offer limited and imprecise information, failing to capture the detailed structural and boundary characteristics necessary for accurate organ delineation. To address these challenges, we propose HELPNet, a novel scribble-based weakly supervised segmentation framework, designed to bridge the gap between annotation efficiency and segmentation performance. HELPNet integrates three modules. The Hierarchical perturbations consistency (HPC) module enhances feature learning by employing density-controlled jigsaw perturbations across global, local, and focal views, enabling robust modeling of multi-scale structural representations. Building on this, the Entropy-guided pseudo-label (EGPL) module evaluates the confidence of segmentation predictions using entropy, generating high-quality pseudo-labels. Finally, the structural prior refinement (SPR) module incorporates connectivity and bounded priors to enhance the precision and reliability and pseudo-labels. Experimental results on three public datasets ACDC, MSCMRseg, and CHAOS show that HELPNet significantly outperforms state-of-the-art methods for scribble-based weakly supervised segmentation and achieves performance comparable to fully supervised methods. The code is available at https://github.com/IPMI-NWU/HELPNet.

Via

Access Paper or Ask Questions

Trigger$^3$: Refining Query Correction via Adaptive Model Selector

Dec 17, 2024

Kepu Zhang, Zhongxiang Sun, Xiao Zhang, Xiaoxue Zang, Kai Zheng, Yang Song, Jun Xu

Figure 1 for Trigger$^3$: Refining Query Correction via Adaptive Model Selector

Figure 2 for Trigger$^3$: Refining Query Correction via Adaptive Model Selector

Figure 3 for Trigger$^3$: Refining Query Correction via Adaptive Model Selector

Figure 4 for Trigger$^3$: Refining Query Correction via Adaptive Model Selector

Abstract:In search scenarios, user experience can be hindered by erroneous queries due to typos, voice errors, or knowledge gaps. Therefore, query correction is crucial for search engines. Current correction models, usually small models trained on specific data, often struggle with queries beyond their training scope or those requiring contextual understanding. While the advent of Large Language Models (LLMs) offers a potential solution, they are still limited by their pre-training data and inference cost, particularly for complex queries, making them not always effective for query correction. To tackle these, we propose Trigger$^3$, a large-small model collaboration framework that integrates the traditional correction model and LLM for query correction, capable of adaptively choosing the appropriate correction method based on the query and the correction results from the traditional correction model and LLM. Trigger$^3$ first employs a correction trigger to filter out correct queries. Incorrect queries are then corrected by the traditional correction model. If this fails, an LLM trigger is activated to call the LLM for correction. Finally, for queries that no model can correct, a fallback trigger decides to return the original query. Extensive experiments demonstrate Trigger$^3$ outperforms correction baselines while maintaining efficiency.

Via

Access Paper or Ask Questions

Retrieval-Augmented Semantic Parsing: Using Large Language Models to Improve Generalization

Dec 13, 2024

Xiao Zhang, Qianru Meng, Johan Bos

Figure 1 for Retrieval-Augmented Semantic Parsing: Using Large Language Models to Improve Generalization

Figure 2 for Retrieval-Augmented Semantic Parsing: Using Large Language Models to Improve Generalization

Figure 3 for Retrieval-Augmented Semantic Parsing: Using Large Language Models to Improve Generalization

Figure 4 for Retrieval-Augmented Semantic Parsing: Using Large Language Models to Improve Generalization

Abstract:Open-domain semantic parsing remains a challenging task, as models often rely on heuristics and struggle to handle unseen concepts. In this paper, we investigate the potential of large language models (LLMs) for this task and introduce Retrieval-Augmented Semantic Parsing (RASP), a simple yet effective approach that integrates external lexical knowledge into the parsing process. Our experiments not only show that LLMs outperform previous encoder-decoder baselines for semantic parsing, but that RASP further enhances their ability to predict unseen concepts, nearly doubling the performance of previous models on out-of-distribution concepts. These findings highlight the promise of leveraging large language models and retrieval mechanisms for robust and open-domain semantic parsing.

* Submitted to ARR

Via

Access Paper or Ask Questions

PROFIT: A Specialized Optimizer for Deep Fine Tuning

Dec 09, 2024

Anirudh S Chakravarthy, Shuai Kyle Zheng, Xin Huang, Sachithra Hemachandra, Xiao Zhang, Yuning Chai, Zhao Chen

Figure 1 for PROFIT: A Specialized Optimizer for Deep Fine Tuning

Figure 2 for PROFIT: A Specialized Optimizer for Deep Fine Tuning

Figure 3 for PROFIT: A Specialized Optimizer for Deep Fine Tuning

Figure 4 for PROFIT: A Specialized Optimizer for Deep Fine Tuning

Abstract:Fine-tuning pre-trained models has become invaluable in computer vision and robotics. Recent fine-tuning approaches focus on improving efficiency rather than accuracy by using a mixture of smaller learning rates or frozen backbones. To return the spotlight to model accuracy, we present PROFIT (Proximally Restricted Optimizer For Iterative Training), one of the first optimizers specifically designed for incrementally fine-tuning converged models on new tasks or datasets. Unlike traditional optimizers such as SGD or Adam, which make minimal assumptions due to random initialization, PROFIT leverages the structure of a converged model to regularize the optimization process, leading to improved results. By employing a simple temporal gradient orthogonalization process, PROFIT outperforms traditional fine-tuning methods across various tasks: image classification, representation learning, and large-scale motion prediction. Moreover, PROFIT is encapsulated within the optimizer logic, making it easily integrated into any training pipeline with minimal engineering effort. A new class of fine-tuning optimizers like PROFIT can drive advancements as fine-tuning and incremental training become increasingly prevalent, reducing reliance on costly model training from scratch.

* technical report

Via

Access Paper or Ask Questions

Nested Diffusion Models Using Hierarchical Latent Priors

Dec 08, 2024

Xiao Zhang, Ruoxi Jiang, Rebecca Willett, Michael Maire

Abstract:We introduce nested diffusion models, an efficient and powerful hierarchical generative framework that substantially enhances the generation quality of diffusion models, particularly for images of complex scenes. Our approach employs a series of diffusion models to progressively generate latent variables at different semantic levels. Each model in this series is conditioned on the output of the preceding higher-level models, culminating in image generation. Hierarchical latent variables guide the generation process along predefined semantic pathways, allowing our approach to capture intricate structural details while significantly improving image quality. To construct these latent variables, we leverage a pre-trained visual encoder, which learns strong semantic visual representations, and modulate its capacity via dimensionality reduction and noise injection. Across multiple datasets, our system demonstrates significant enhancements in image quality for both unconditional and class/text conditional generation. Moreover, our unconditional generation system substantially outperforms the baseline conditional system. These advancements incur minimal computational overhead as the more abstract levels of our hierarchy work with lower-dimensional representations.

Via

Access Paper or Ask Questions

Joint Mode Selection and Beamforming Designs for Hybrid-RIS Assisted ISAC Systems

Dec 05, 2024

Yingbin Lin, Feng Wang, Xiao Zhang, Guojun Han, Vincent K. N. Lau

Figure 1 for Joint Mode Selection and Beamforming Designs for Hybrid-RIS Assisted ISAC Systems

Figure 2 for Joint Mode Selection and Beamforming Designs for Hybrid-RIS Assisted ISAC Systems

Figure 3 for Joint Mode Selection and Beamforming Designs for Hybrid-RIS Assisted ISAC Systems

Figure 4 for Joint Mode Selection and Beamforming Designs for Hybrid-RIS Assisted ISAC Systems

Abstract:This paper considers a hybrid reconfigurable intelligent surface (RIS) assisted integrated sensing and communication (ISAC) system, where each RIS element can flexibly switch between the active and passive modes. Subject to the signal-to-interference-plus-noise ratio (SINR) constraint for each communication user (CU) and the transmit power constraints for both the base station (BS) and the active RIS elements, with the objective of maximizing the minimum beampattern gain among multiple targets, we jointly optimize the BS transmit beamforming for ISAC and the mode selection of each RIS reflecting element, as well as the RIS reflection coefficient matrix. Such formulated joint hybrid-RIS assisted ISAC design problem is a mixed-integer nonlinear program, which is decomposed into two low-dimensional subproblems being solved in an alternating manner. Specifically, by using the semidefinite relaxation (SDR) technique along with the rank-one beamforming construction process, we efficiently obtain the optimal ISAC transmit beamforming design at the BS. Via the SDR and successive convex approximation (SCA) techniques, we jointly determine the active/passive mode selection and reflection coefficient for each RIS element. Numerical results demonstrate that the proposed design solution is significantly superior to the existing baseline solutions.

* 5 pages, 4 figures

Via

Access Paper or Ask Questions

Can Targeted Clean-Label Poisoning Attacks Generalize?

Dec 05, 2024

Zhizhen Chen, Subrat Kishore Dutta, Zhengyu Zhao, Chenhao Lin, Chao Shen, Xiao Zhang

Figure 1 for Can Targeted Clean-Label Poisoning Attacks Generalize?

Figure 2 for Can Targeted Clean-Label Poisoning Attacks Generalize?

Figure 3 for Can Targeted Clean-Label Poisoning Attacks Generalize?

Figure 4 for Can Targeted Clean-Label Poisoning Attacks Generalize?

Abstract:Targeted poisoning attacks aim to compromise the model's prediction on specific target samples. In a common clean-label setting, they are achieved by slightly perturbing a subset of training samples given access to those specific targets. Despite continuous efforts, it remains unexplored whether such attacks can generalize to unknown variations of those targets. In this paper, we take the first step to systematically study this generalization problem. Observing that the widely adopted, cosine similarity-based attack exhibits limited generalizability, we propose a well-generalizable attack that leverages both the direction and magnitude of model gradients. In particular, we explore diverse target variations, such as an object with varied viewpoints and an animal species with distinct appearances. Extensive experiments across various generalization scenarios demonstrate that our method consistently achieves the best attack effectiveness. For example, our method outperforms the cosine similarity-based attack by 20.95% in attack success rate with similar overall accuracy, averaged over four models on two image benchmark datasets. The code is available at https://github.com/jiaangk/generalizable_tcpa

* 12 pages, 5 figures, 5 tables

Via

Access Paper or Ask Questions

PROFIT: A PROximal FIne Tuning Optimizer for Multi-Task Learning

Dec 02, 2024

Anirudh S Chakravarthy, Shuai Kyle Zheng, Xin Huang, Sachithra Hemachandra, Xiao Zhang, Yuning Chai, Zhao Chen

Figure 1 for PROFIT: A PROximal FIne Tuning Optimizer for Multi-Task Learning

Figure 2 for PROFIT: A PROximal FIne Tuning Optimizer for Multi-Task Learning

Figure 3 for PROFIT: A PROximal FIne Tuning Optimizer for Multi-Task Learning

Figure 4 for PROFIT: A PROximal FIne Tuning Optimizer for Multi-Task Learning

Abstract:Fine-tuning pre-trained models has become invaluable in computer vision and robotics. Recent fine-tuning approaches focus on improving efficiency rather than accuracy by using a mixture of smaller learning rates or frozen backbones. To return the spotlight to model accuracy, we present PROFIT, one of the first optimizers specifically designed for incrementally fine-tuning converged models on new tasks or datasets. Unlike traditional optimizers such as SGD or Adam, which make minimal assumptions due to random initialization, PROFIT leverages the structure of a converged model to regularize the optimization process, leading to improved results. By employing a simple temporal gradient orthogonalization process, PROFIT outperforms traditional fine-tuning methods across various tasks: image classification, representation learning, and large-scale motion prediction. Moreover, PROFIT is encapsulated within the optimizer logic, making it easily integrated into any training pipeline with minimal engineering effort. A new class of fine-tuning optimizers like PROFIT can drive advancements as fine-tuning and incremental training become increasingly prevalent, reducing reliance on costly model training from scratch.

* technical report

Via

Access Paper or Ask Questions

LampMark: Proactive Deepfake Detection via Training-Free Landmark Perceptual Watermarks

Nov 26, 2024

Tianyi Wang, Mengxiao Huang, Harry Cheng, Xiao Zhang, Zhiqi Shen

Figure 1 for LampMark: Proactive Deepfake Detection via Training-Free Landmark Perceptual Watermarks

Figure 2 for LampMark: Proactive Deepfake Detection via Training-Free Landmark Perceptual Watermarks

Figure 3 for LampMark: Proactive Deepfake Detection via Training-Free Landmark Perceptual Watermarks

Figure 4 for LampMark: Proactive Deepfake Detection via Training-Free Landmark Perceptual Watermarks

Abstract:Deepfake facial manipulation has garnered significant public attention due to its impacts on enhancing human experiences and posing privacy threats. Despite numerous passive algorithms that have been attempted to thwart malicious Deepfake attacks, they mostly struggle with the generalizability challenge when confronted with hyper-realistic synthetic facial images. To tackle the problem, this paper proposes a proactive Deepfake detection approach by introducing a novel training-free landmark perceptual watermark, LampMark for short. We first analyze the structure-sensitive characteristics of Deepfake manipulations and devise a secure and confidential transformation pipeline from the structural representations, i.e. facial landmarks, to binary landmark perceptual watermarks. Subsequently, we present an end-to-end watermarking framework that imperceptibly and robustly embeds and extracts watermarks concerning the images to be protected. Relying on promising watermark recovery accuracies, Deepfake detection is accomplished by assessing the consistency between the content-matched landmark perceptual watermark and the robustly recovered watermark of the suspect image. Experimental results demonstrate the superior performance of our approach in watermark recovery and Deepfake detection compared to state-of-the-art methods across in-dataset, cross-dataset, and cross-manipulation scenarios.

* Accepted to ACM MM 2024

Via

Access Paper or Ask Questions