Picture for Zidi Xiong

Zidi Xiong

When Models Reason in Your Language: Controlling Thinking Trace Language Comes at the Cost of Accuracy

Add code
May 28, 2025
Viaarxiv icon

How Memory Management Impacts LLM Agents: An Empirical Study of Experience-Following Behavior

Add code
May 21, 2025
Viaarxiv icon

Measuring the Faithfulness of Thinking Drafts in Large Reasoning Models

Add code
May 19, 2025
Viaarxiv icon

MMDT: Decoding the Trustworthiness and Safety of Multimodal Foundation Models

Add code
Mar 19, 2025
Viaarxiv icon

GuardAgent: Safeguard LLM Agents by a Guard Agent via Knowledge-Enabled Reasoning

Add code
Jun 13, 2024
Figure 1 for GuardAgent: Safeguard LLM Agents by a Guard Agent via Knowledge-Enabled Reasoning
Figure 2 for GuardAgent: Safeguard LLM Agents by a Guard Agent via Knowledge-Enabled Reasoning
Figure 3 for GuardAgent: Safeguard LLM Agents by a Guard Agent via Knowledge-Enabled Reasoning
Figure 4 for GuardAgent: Safeguard LLM Agents by a Guard Agent via Knowledge-Enabled Reasoning
Viaarxiv icon

RigorLLM: Resilient Guardrails for Large Language Models against Undesired Content

Add code
Mar 19, 2024
Viaarxiv icon

BadChain: Backdoor Chain-of-Thought Prompting for Large Language Models

Add code
Jan 20, 2024
Viaarxiv icon

CBD: A Certified Backdoor Detector Based on Local Dominant Probability

Add code
Oct 26, 2023
Figure 1 for CBD: A Certified Backdoor Detector Based on Local Dominant Probability
Figure 2 for CBD: A Certified Backdoor Detector Based on Local Dominant Probability
Figure 3 for CBD: A Certified Backdoor Detector Based on Local Dominant Probability
Figure 4 for CBD: A Certified Backdoor Detector Based on Local Dominant Probability
Viaarxiv icon

DecodingTrust: A Comprehensive Assessment of Trustworthiness in GPT Models

Add code
Jun 20, 2023
Viaarxiv icon

UMD: Unsupervised Model Detection for X2X Backdoor Attacks

Add code
Jun 02, 2023
Viaarxiv icon