Abstract:Prompt-driven Video Segmentation Foundation Models (VSFMs) such as SAM2 are increasingly deployed in applications like autonomous driving and digital pathology, raising concerns about backdoor threats. Surprisingly, we find that directly transferring classic backdoor attacks (e.g., BadNet) to VSFMs is almost ineffective, with ASR below 5\%. To understand this, we study encoder gradients and attention maps and observe that conventional training keeps gradients for clean and triggered samples largely aligned, while attention still focuses on the true object, preventing the encoder from learning a distinct trigger-related representation. To address this challenge, we propose BadVSFM, the first backdoor framework tailored to prompt-driven VSFMs. BadVSFM uses a two-stage strategy: (1) steer the image encoder so triggered frames map to a designated target embedding while clean frames remain aligned with a clean reference encoder; (2) train the mask decoder so that, across prompt types, triggered frame-prompt pairs produce a shared target mask, while clean outputs stay close to a reference decoder. Extensive experiments on two datasets and five VSFMs show that BadVSFM achieves strong, controllable backdoor effects under diverse triggers and prompts while preserving clean segmentation quality. Ablations over losses, stages, targets, trigger settings, and poisoning rates demonstrate robustness to reasonable hyperparameter changes and confirm the necessity of the two-stage design. Finally, gradient-conflict analysis and attention visualizations show that BadVSFM separates triggered and clean representations and shifts attention to trigger regions, while four representative defenses remain largely ineffective, revealing an underexplored vulnerability in current VSFMs.




Abstract:The rise of large language models (LLMs) has raised concerns about machine-generated text (MGT), including ethical and practical issues like plagiarism and misinformation. Building a robust and highly generalizable MGT detection system has become increasingly important. This work investigates the generalization capabilities of MGT detectors in three aspects: First, we construct MGTAcademic, a large-scale dataset focused on academic writing, featuring human-written texts (HWTs) and MGTs across STEM, Humanities, and Social Sciences, paired with an extensible code framework for efficient benchmarking. Second, we investigate the transferability of detectors across domains and LLMs, leveraging fine-grained datasets to reveal insights into domain transferring and implementing few-shot techniques to improve the performance by roughly 13.2%. Third, we introduce a novel attribution task where models must adapt to new classes over time without (or with very limited) access to prior training data and benchmark detectors. We implement several adapting techniques to improve the performance by roughly 10% and highlight the inherent complexity of the task. Our findings provide insights into the generalization ability of MGT detectors across diverse scenarios and lay the foundation for building robust, adaptive detection systems.