Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Yuyao Wu

DeepSight: An All-in-One LM Safety Toolkit

Feb 12, 2026

Bo Zhang, Jiaxuan Guo, Lijun Li, Dongrui Liu, Sujin Chen, Guanxu Chen, Zhijie Zheng, Qihao Lin, Lewen Yan, Chen Qian(+10 more)

Abstract:As the development of Large Models (LMs) progresses rapidly, their safety is also a priority. In current Large Language Models (LLMs) and Multimodal Large Language Models (MLLMs) safety workflow, evaluation, diagnosis, and alignment are often handled by separate tools. Specifically, safety evaluation can only locate external behavioral risks but cannot figure out internal root causes. Meanwhile, safety diagnosis often drifts from concrete risk scenarios and remains at the explainable level. In this way, safety alignment lack dedicated explanations of changes in internal mechanisms, potentially degrading general capabilities. To systematically address these issues, we propose an open-source project, namely DeepSight, to practice a new safety evaluation-diagnosis integrated paradigm. DeepSight is low-cost, reproducible, efficient, and highly scalable large-scale model safety evaluation project consisting of a evaluation toolkit DeepSafe and a diagnosis toolkit DeepScan. By unifying task and data protocols, we build a connection between the two stages and transform safety evaluation from black-box to white-box insight. Besides, DeepSight is the first open source toolkit that support the frontier AI risk evaluation and joint safety evaluation and diagnosis.

* Technical report, 29 pages, 24 figures

Via

Access Paper or Ask Questions

STaR-Attack: A Spatio-Temporal and Narrative Reasoning Attack Framework for Unified Multimodal Understanding and Generation Models

Sep 30, 2025

Shaoxiong Guo, Tianyi Du, Lijun Li, Yuyao Wu, Jie Li, Jing Shao

Figure 1 for STaR-Attack: A Spatio-Temporal and Narrative Reasoning Attack Framework for Unified Multimodal Understanding and Generation Models

Figure 2 for STaR-Attack: A Spatio-Temporal and Narrative Reasoning Attack Framework for Unified Multimodal Understanding and Generation Models

Figure 3 for STaR-Attack: A Spatio-Temporal and Narrative Reasoning Attack Framework for Unified Multimodal Understanding and Generation Models

Figure 4 for STaR-Attack: A Spatio-Temporal and Narrative Reasoning Attack Framework for Unified Multimodal Understanding and Generation Models

Abstract:Unified Multimodal understanding and generation Models (UMMs) have demonstrated remarkable capabilities in both understanding and generation tasks. However, we identify a vulnerability arising from the generation-understanding coupling in UMMs. The attackers can use the generative function to craft an information-rich adversarial image and then leverage the understanding function to absorb it in a single pass, which we call Cross-Modal Generative Injection (CMGI). Current attack methods on malicious instructions are often limited to a single modality while also relying on prompt rewriting with semantic drift, leaving the unique vulnerabilities of UMMs unexplored. We propose STaR-Attack, the first multi-turn jailbreak attack framework that exploits unique safety weaknesses of UMMs without semantic drift. Specifically, our method defines a malicious event that is strongly correlated with the target query within a spatio-temporal context. Using the three-act narrative theory, STaR-Attack generates the pre-event and the post-event scenes while concealing the malicious event as the hidden climax. When executing the attack strategy, the opening two rounds exploit the UMM's generative ability to produce images for these scenes. Subsequently, an image-based question guessing and answering game is introduced by exploiting the understanding capability. STaR-Attack embeds the original malicious question among benign candidates, forcing the model to select and answer the most relevant one given the narrative context. Extensive experiments show that STaR-Attack consistently surpasses prior approaches, achieving up to 93.06% ASR on Gemini-2.0-Flash and surpasses the strongest prior baseline, FlipAttack. Our work uncovers a critical yet underdeveloped vulnerability and highlights the need for safety alignments in UMMs.

Via

Access Paper or Ask Questions

Channel sensing for holographic MIMO surfaces based on interference principle

Aug 20, 2023

Jindiao Huang, Yuyao Wu, Haifan Yin, Yuhao Zhang, Ruikun Zhang

Figure 1 for Channel sensing for holographic MIMO surfaces based on interference principle

Figure 2 for Channel sensing for holographic MIMO surfaces based on interference principle

Figure 3 for Channel sensing for holographic MIMO surfaces based on interference principle

Figure 4 for Channel sensing for holographic MIMO surfaces based on interference principle

Abstract:The Holographic Multiple-Input and Multiple-Output (HMIMO) provides a new paradigm for building a more cost-effective wireless communication architecture. In this paper, we derive the principles of holographic interference theory for electromagnetic wave reception and transmission, whereby the optical holography is extended to communication holography and a channel sensing architecture for holographic MIMO surfaces is established. Unlike the traditional pilot-based MIMO channel estimation approaches, the proposed architecture circumvents the complicated processes like filtering, analog to digital conversion (ADC), down conversion. Instead, it relies on interfering the object waves with a pre-designed reference wave, and therefore reduces the hardware complexity and requires less time-frequency resources for channel estimation. To address the self-interference problem in the holographic recording process, we propose a phase shifting-based interference suppression (PSIS) method according to the structural characteristics of communication hologram and interference composition. We then propose a Prony-based multi-user channel segmentation (PMCS) algorithm to acquire the channel state information (CSI). Our theoretical analysis shows that the estimation error of the PMCS algorithm converges to zero when the number of HMIMO surface antennas is large enough. Simulation results show that under the holographic architecture, our proposed algorithm can accurately estimate the CSI in multi-user scenarios.

Via

Access Paper or Ask Questions