Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Xingyu Zhao

OBJVanish: Physically Realizable Text-to-3D Adv. Generation of LiDAR-Invisible Objects

Oct 08, 2025

Bing Li, Wuqi Wang, Yanan Zhang, Jingzheng Li, Haigen Min, Wei Feng, Xingyu Zhao, Jie Zhang, Qing Guo

Abstract:LiDAR-based 3D object detectors are fundamental to autonomous driving, where failing to detect objects poses severe safety risks. Developing effective 3D adversarial attacks is essential for thoroughly testing these detection systems and exposing their vulnerabilities before real-world deployment. However, existing adversarial attacks that add optimized perturbations to 3D points have two critical limitations: they rarely cause complete object disappearance and prove difficult to implement in physical environments. We introduce the text-to-3D adversarial generation method, a novel approach enabling physically realizable attacks that can generate 3D models of objects truly invisible to LiDAR detectors and be easily realized in the real world. Specifically, we present the first empirical study that systematically investigates the factors influencing detection vulnerability by manipulating the topology, connectivity, and intensity of individual pedestrian 3D models and combining pedestrians with multiple objects within the CARLA simulation environment. Building on the insights, we propose the physically-informed text-to-3D adversarial generation (Phy3DAdvGen) that systematically optimizes text prompts by iteratively refining verbs, objects, and poses to produce LiDAR-invisible pedestrians. To ensure physical realizability, we construct a comprehensive object pool containing 13 3D models of real objects and constrain Phy3DAdvGen to generate 3D objects based on combinations of objects in this set. Extensive experiments demonstrate that our approach can generate 3D pedestrians that evade six state-of-the-art (SOTA) LiDAR 3D detectors in both CARLA simulation and physical environments, thereby highlighting vulnerabilities in safety-critical applications.

Via

Access Paper or Ask Questions

Tailored Teaching with Balanced Difficulty: Elevating Reasoning in Multimodal Chain-of-Thought via Prompt Curriculum

Aug 26, 2025

Xinglong Yang, Quan Feng, Zhongying Pan, Xiang Chen, Yu Tian, Wentong Li, Shuofei Qiao, Yuxia Geng, Xingyu Zhao, Sheng-Jun Huang

Abstract:The effectiveness of Multimodal Chain-of-Thought (MCoT) prompting is often limited by the use of randomly or manually selected examples. These examples fail to account for both model-specific knowledge distributions and the intrinsic complexity of the tasks, resulting in suboptimal and unstable model performance. To address this, we propose a novel framework inspired by the pedagogical principle of "tailored teaching with balanced difficulty". We reframe prompt selection as a prompt curriculum design problem: constructing a well ordered set of training examples that align with the model's current capabilities. Our approach integrates two complementary signals: (1) model-perceived difficulty, quantified through prediction disagreement in an active learning setup, capturing what the model itself finds challenging; and (2) intrinsic sample complexity, which measures the inherent difficulty of each question-image pair independently of any model. By jointly analyzing these signals, we develop a difficulty-balanced sampling strategy that ensures the selected prompt examples are diverse across both dimensions. Extensive experiments conducted on five challenging benchmarks and multiple popular Multimodal Large Language Models (MLLMs) demonstrate that our method yields substantial and consistent improvements and greatly reduces performance discrepancies caused by random sampling, providing a principled and robust approach for enhancing multimodal reasoning.

Via

Access Paper or Ask Questions

On the Need for a Statistical Foundation in Scenario-Based Testing of Autonomous Vehicles

May 04, 2025

Xingyu Zhao, Robab Aghazadeh-Chakherlou, Chih-Hong Cheng, Peter Popov, Lorenzo Strigini

Abstract:Scenario-based testing has emerged as a common method for autonomous vehicles (AVs) safety, offering a more efficient alternative to mile-based testing by focusing on high-risk scenarios. However, fundamental questions persist regarding its stopping rules, residual risk estimation, debug effectiveness, and the impact of simulation fidelity on safety claims. This paper argues that a rigorous statistical foundation is essential to address these challenges and enable rigorous safety assurance. By drawing parallels between AV testing and traditional software testing methodologies, we identify shared research gaps and reusable solutions. We propose proof-of-concept models to quantify the probability of failure per scenario (pfs) and evaluate testing effectiveness under varying conditions. Our analysis reveals that neither scenario-based nor mile-based testing universally outperforms the other. Furthermore, we introduce Risk Estimation Fidelity (REF), a novel metric to certify the alignment of synthetic and real-world testing outcomes, ensuring simulation-based safety claims are statistically defensible.

* under review

Via

Access Paper or Ask Questions

Sensing-Then-Beamforming: Robust Transmission Design for RIS-Empowered Integrated Sensing and Covert Communication

Apr 18, 2025

Xingyu Zhao, Min Li, Ming-Min Zhao, Shihao Yan, Min-Jian Zhao

Figure 1 for Sensing-Then-Beamforming: Robust Transmission Design for RIS-Empowered Integrated Sensing and Covert Communication

Figure 2 for Sensing-Then-Beamforming: Robust Transmission Design for RIS-Empowered Integrated Sensing and Covert Communication

Figure 3 for Sensing-Then-Beamforming: Robust Transmission Design for RIS-Empowered Integrated Sensing and Covert Communication

Figure 4 for Sensing-Then-Beamforming: Robust Transmission Design for RIS-Empowered Integrated Sensing and Covert Communication

Abstract:Traditional covert communication often relies on the knowledge of the warden's channel state information, which is inherently challenging to obtain due to the non-cooperative nature and potential mobility of the warden. The integration of sensing and communication technology provides a promising solution by enabling the legitimate transmitter to sense and track the warden, thereby enhancing transmission covertness. In this paper, we develop a framework for sensing-then-beamforming in reconfigurable intelligent surface (RIS)-empowered integrated sensing and covert communication (ISCC) systems, where the transmitter (Alice) estimates and tracks the mobile aerial warden's channel using sensing echo signals while simultaneously sending covert information to multiple legitimate users (Bobs) with the assistance of RIS, under the surveillance of the warden (Willie). Considering channel estimation errors, we formulate a robust non-convex optimization problem that jointly designs the communication beamformers, the sensing signal covariance matrix at Alice, and the phase shifts at the RIS to maximize the covert sum rate of Bobs while satisfying the constraints related to covert communication, sensing, transmitter power, and the unit modulus of the RIS elements. To solve this complex problem, we develop an efficient algorithm using alternating optimization, successive convex approximation, S-procedure, sequential rank-one constraint relaxation, and semidefinite relaxation techniques. Numerical results confirm the convergence of the proposed algorithm and demonstrate its effectiveness in tracking the warden's channel while ensuring robust covert transmission. Furthermore, the results highlight the advantages of using RIS to enhance the covert transmission rate compared to baseline schemes, and also illustrate the intricate trade-off between communication and sensing in ISCC systems.

* 13 pages; submitted for possible publication

Via

Access Paper or Ask Questions

Movable Antenna Enhanced Downlink Multi-User Integrated Sensing and Communication System

Mar 28, 2025

Yanze Han, Min Li, Xingyu Zhao, Ming-Min Zhao, Min-Jian Zhao

Figure 1 for Movable Antenna Enhanced Downlink Multi-User Integrated Sensing and Communication System

Figure 2 for Movable Antenna Enhanced Downlink Multi-User Integrated Sensing and Communication System

Figure 3 for Movable Antenna Enhanced Downlink Multi-User Integrated Sensing and Communication System

Figure 4 for Movable Antenna Enhanced Downlink Multi-User Integrated Sensing and Communication System

Abstract:This work investigates the potential of exploiting movable antennas (MAs) to enhance the performance of a multi-user downlink integrated sensing and communication (ISAC) system. Specifically, we formulate an optimization problem to maximize the transmit beampattern gain for sensing while simultaneously meeting each user's communication requirement by jointly optimizing antenna positions and beamforming design. The problem formulated is highly non-convex and involves multivariate-coupled constraints. To address these challenges, we introduce a series of auxiliary random variables and transform the original problem into an augmented Lagrangian problem. A double-loop algorithm based on a penalty dual decomposition framework is then developed to solve the problem. Numerical results validate the effectiveness of the proposed design, demonstrating its superiority over MA designs based on successive convex approximation optimization and other baseline approaches in ISAC systems. The results also highlight the advantages of MAs in achieving better sensing performance and improved beam control, especially for sparse arrays with large apertures.

* accepted and to appear in IEEE VTC2025-Spring

Via

Access Paper or Ask Questions

Scalable Trajectory-User Linking with Dual-Stream Representation Networks

Mar 19, 2025

Hao Zhang, Wei Chen, Xingyu Zhao, Jianpeng Qi, Guiyuan Jiang, Yanwei Yu

Figure 1 for Scalable Trajectory-User Linking with Dual-Stream Representation Networks

Figure 2 for Scalable Trajectory-User Linking with Dual-Stream Representation Networks

Figure 3 for Scalable Trajectory-User Linking with Dual-Stream Representation Networks

Figure 4 for Scalable Trajectory-User Linking with Dual-Stream Representation Networks

Abstract:Trajectory-user linking (TUL) aims to match anonymous trajectories to the most likely users who generated them, offering benefits for a wide range of real-world spatio-temporal applications. However, existing TUL methods are limited by high model complexity and poor learning of the effective representations of trajectories, rendering them ineffective in handling large-scale user trajectory data. In this work, we propose a novel $\underline{Scal}$abl$\underline{e}$ Trajectory-User Linking with dual-stream representation networks for large-scale $\underline{TUL}$ problem, named ScaleTUL. Specifically, ScaleTUL generates two views using temporal and spatial augmentations to exploit supervised contrastive learning framework to effectively capture the irregularities of trajectories. In each view, a dual-stream trajectory encoder, consisting of a long-term encoder and a short-term encoder, is designed to learn unified trajectory representations that fuse different temporal-spatial dependencies. Then, a TUL layer is used to associate the trajectories with the corresponding users in the representation space using a two-stage training model. Experimental results on check-in mobility datasets from three real-world cities and the nationwide U.S. demonstrate the superiority of ScaleTUL over state-of-the-art baselines for large-scale TUL tasks.

* The paper has been accepted by AAAI 2025

Via

Access Paper or Ask Questions

TAIJI: Textual Anchoring for Immunizing Jailbreak Images in Vision Language Models

Mar 13, 2025

Xiangyu Yin, Yi Qi, Jinwei Hu, Zhen Chen, Yi Dong, Xingyu Zhao, Xiaowei Huang, Wenjie Ruan

Abstract:Vision Language Models (VLMs) have demonstrated impressive inference capabilities, but remain vulnerable to jailbreak attacks that can induce harmful or unethical responses. Existing defence methods are predominantly white-box approaches that require access to model parameters and extensive modifications, making them costly and impractical for many real-world scenarios. Although some black-box defences have been proposed, they often impose input constraints or require multiple queries, limiting their effectiveness in safety-critical tasks such as autonomous driving. To address these challenges, we propose a novel black-box defence framework called \textbf{T}extual \textbf{A}nchoring for \textbf{I}mmunizing \textbf{J}ailbreak \textbf{I}mages (\textbf{TAIJI}). TAIJI leverages key phrase-based textual anchoring to enhance the model's ability to assess and mitigate the harmful content embedded within both visual and textual prompts. Unlike existing methods, TAIJI operates effectively with a single query during inference, while preserving the VLM's performance on benign tasks. Extensive experiments demonstrate that TAIJI significantly enhances the safety and reliability of VLMs, providing a practical and efficient solution for real-world deployment.

* Under review of IJCAI-25

Via

Access Paper or Ask Questions

Probabilistic Robustness in Deep Learning: A Concise yet Comprehensive Guide

Feb 20, 2025

Xingyu Zhao

Abstract:Deep learning (DL) has demonstrated significant potential across various safety-critical applications, yet ensuring its robustness remains a key challenge. While adversarial robustness has been extensively studied in worst-case scenarios, probabilistic robustness (PR) offers a more practical perspective by quantifying the likelihood of failures under stochastic perturbations. This paper provides a concise yet comprehensive overview of PR, covering its formal definitions, evaluation and enhancement methods. We introduce a reformulated ''min-max'' optimisation framework for adversarial training specifically designed to improve PR. Furthermore, we explore the integration of PR verification evidence into system-level safety assurance, addressing challenges in translating DL model-level robustness to system-level claims. Finally, we highlight open research questions, including benchmarking PR evaluation methods, extending PR to generative AI tasks, and developing rigorous methodologies and case studies for system-level integration.

* This is a preprint of the following chapter: X. Zhao, Probabilistic Robustness in Deep Learning: A Concise yet Comprehensive Guide, published in the book Adversarial Example Detection and Mitigation Using Machine Learning, edited by Ehsan Nowroozi, Rahim Taheri, Lucas Cordeiro, 2025, Springer Nature. The final authenticated version will available online soon

Via

Access Paper or Ask Questions

SIDA: Social Media Image Deepfake Detection, Localization and Explanation with Large Multimodal Model

Dec 05, 2024

Zhenglin Huang, Jinwei Hu, Xiangtai Li, Yiwei He, Xingyu Zhao, Bei Peng, Baoyuan Wu, Xiaowei Huang, Guangliang Cheng

Figure 1 for SIDA: Social Media Image Deepfake Detection, Localization and Explanation with Large Multimodal Model

Figure 2 for SIDA: Social Media Image Deepfake Detection, Localization and Explanation with Large Multimodal Model

Figure 3 for SIDA: Social Media Image Deepfake Detection, Localization and Explanation with Large Multimodal Model

Figure 4 for SIDA: Social Media Image Deepfake Detection, Localization and Explanation with Large Multimodal Model

Abstract:The rapid advancement of generative models in creating highly realistic images poses substantial risks for misinformation dissemination. For instance, a synthetic image, when shared on social media, can mislead extensive audiences and erode trust in digital content, resulting in severe repercussions. Despite some progress, academia has not yet created a large and diversified deepfake detection dataset for social media, nor has it devised an effective solution to address this issue. In this paper, we introduce the Social media Image Detection dataSet (SID-Set), which offers three key advantages: (1) extensive volume, featuring 300K AI-generated/tampered and authentic images with comprehensive annotations, (2) broad diversity, encompassing fully synthetic and tampered images across various classes, and (3) elevated realism, with images that are predominantly indistinguishable from genuine ones through mere visual inspection. Furthermore, leveraging the exceptional capabilities of large multimodal models, we propose a new image deepfake detection, localization, and explanation framework, named SIDA (Social media Image Detection, localization, and explanation Assistant). SIDA not only discerns the authenticity of images, but also delineates tampered regions through mask prediction and provides textual explanations of the model's judgment criteria. Compared with state-of-the-art deepfake detection models on SID-Set and other benchmarks, extensive experiments demonstrate that SIDA achieves superior performance among diversified settings. The code, model, and dataset will be released.

Via

Access Paper or Ask Questions

Robust RL with LLM-Driven Data Synthesis and Policy Adaptation for Autonomous Driving

Oct 16, 2024

Sihao Wu, Jiaxu Liu, Xiangyu Yin, Guangliang Cheng, Meng Fang, Xingyu Zhao, Xinping Yi, Xiaowei Huang

Figure 1 for Robust RL with LLM-Driven Data Synthesis and Policy Adaptation for Autonomous Driving

Figure 2 for Robust RL with LLM-Driven Data Synthesis and Policy Adaptation for Autonomous Driving

Figure 3 for Robust RL with LLM-Driven Data Synthesis and Policy Adaptation for Autonomous Driving

Figure 4 for Robust RL with LLM-Driven Data Synthesis and Policy Adaptation for Autonomous Driving

Abstract:The integration of Large Language Models (LLMs) into autonomous driving systems demonstrates strong common sense and reasoning abilities, effectively addressing the pitfalls of purely data-driven methods. Current LLM-based agents require lengthy inference times and face challenges in interacting with real-time autonomous driving environments. A key open question is whether we can effectively leverage the knowledge from LLMs to train an efficient and robust Reinforcement Learning (RL) agent. This paper introduces RAPID, a novel \underline{\textbf{R}}obust \underline{\textbf{A}}daptive \underline{\textbf{P}}olicy \underline{\textbf{I}}nfusion and \underline{\textbf{D}}istillation framework, which trains specialized mix-of-policy RL agents using data synthesized by an LLM-based driving agent and online adaptation. RAPID features three key designs: 1) utilization of offline data collected from an LLM agent to distil expert knowledge into RL policies for faster real-time inference; 2) introduction of robust distillation in RL to inherit both performance and robustness from LLM-based teacher; and 3) employment of a mix-of-policy approach for joint decision decoding with a policy adapter. Through fine-tuning via online environment interaction, RAPID reduces the forgetting of LLM knowledge while maintaining adaptability to different tasks. Extensive experiments demonstrate RAPID's capability to effectively integrate LLM knowledge into scaled-down RL policies in an efficient, adaptable, and robust way. Code and checkpoints will be made publicly available upon acceptance.

Via

Access Paper or Ask Questions