Picture for Bochuan Cao

Bochuan Cao

Monitoring Decoding: Mitigating Hallucination via Evaluating the Factuality of Partial Response during Generation

Add code
Mar 05, 2025
Figure 1 for Monitoring Decoding: Mitigating Hallucination via Evaluating the Factuality of Partial Response during Generation
Figure 2 for Monitoring Decoding: Mitigating Hallucination via Evaluating the Factuality of Partial Response during Generation
Figure 3 for Monitoring Decoding: Mitigating Hallucination via Evaluating the Factuality of Partial Response during Generation
Figure 4 for Monitoring Decoding: Mitigating Hallucination via Evaluating the Factuality of Partial Response during Generation
Viaarxiv icon

TruthFlow: Truthful LLM Generation via Representation Flow Correction

Add code
Feb 06, 2025
Figure 1 for TruthFlow: Truthful LLM Generation via Representation Flow Correction
Figure 2 for TruthFlow: Truthful LLM Generation via Representation Flow Correction
Figure 3 for TruthFlow: Truthful LLM Generation via Representation Flow Correction
Figure 4 for TruthFlow: Truthful LLM Generation via Representation Flow Correction
Viaarxiv icon

Data Free Backdoor Attacks

Add code
Dec 09, 2024
Figure 1 for Data Free Backdoor Attacks
Figure 2 for Data Free Backdoor Attacks
Figure 3 for Data Free Backdoor Attacks
Figure 4 for Data Free Backdoor Attacks
Viaarxiv icon

AdvI2I: Adversarial Image Attack on Image-to-Image Diffusion models

Add code
Oct 28, 2024
Viaarxiv icon

On the Intrinsic Self-Correction Capability of LLMs: Uncertainty and Latent Concept

Add code
Jun 04, 2024
Viaarxiv icon

XPrompt:Explaining Large Language Model's Generation via Joint Prompt Attribution

Add code
May 30, 2024
Figure 1 for XPrompt:Explaining Large Language Model's Generation via Joint Prompt Attribution
Figure 2 for XPrompt:Explaining Large Language Model's Generation via Joint Prompt Attribution
Figure 3 for XPrompt:Explaining Large Language Model's Generation via Joint Prompt Attribution
Figure 4 for XPrompt:Explaining Large Language Model's Generation via Joint Prompt Attribution
Viaarxiv icon

Personalized Steering of Large Language Models: Versatile Steering Vectors Through Bi-directional Preference Optimization

Add code
May 28, 2024
Figure 1 for Personalized Steering of Large Language Models: Versatile Steering Vectors Through Bi-directional Preference Optimization
Figure 2 for Personalized Steering of Large Language Models: Versatile Steering Vectors Through Bi-directional Preference Optimization
Figure 3 for Personalized Steering of Large Language Models: Versatile Steering Vectors Through Bi-directional Preference Optimization
Figure 4 for Personalized Steering of Large Language Models: Versatile Steering Vectors Through Bi-directional Preference Optimization
Viaarxiv icon

WordGame: Efficient & Effective LLM Jailbreak via Simultaneous Obfuscation in Query and Response

Add code
May 22, 2024
Figure 1 for WordGame: Efficient & Effective LLM Jailbreak via Simultaneous Obfuscation in Query and Response
Figure 2 for WordGame: Efficient & Effective LLM Jailbreak via Simultaneous Obfuscation in Query and Response
Figure 3 for WordGame: Efficient & Effective LLM Jailbreak via Simultaneous Obfuscation in Query and Response
Figure 4 for WordGame: Efficient & Effective LLM Jailbreak via Simultaneous Obfuscation in Query and Response
Viaarxiv icon

On the Difficulty of Defending Contrastive Learning against Backdoor Attacks

Add code
Dec 14, 2023
Figure 1 for On the Difficulty of Defending Contrastive Learning against Backdoor Attacks
Figure 2 for On the Difficulty of Defending Contrastive Learning against Backdoor Attacks
Figure 3 for On the Difficulty of Defending Contrastive Learning against Backdoor Attacks
Figure 4 for On the Difficulty of Defending Contrastive Learning against Backdoor Attacks
Viaarxiv icon

Stealthy and Persistent Unalignment on Large Language Models via Backdoor Injections

Add code
Nov 15, 2023
Figure 1 for Stealthy and Persistent Unalignment on Large Language Models via Backdoor Injections
Figure 2 for Stealthy and Persistent Unalignment on Large Language Models via Backdoor Injections
Figure 3 for Stealthy and Persistent Unalignment on Large Language Models via Backdoor Injections
Figure 4 for Stealthy and Persistent Unalignment on Large Language Models via Backdoor Injections
Viaarxiv icon