Picture for Caixin Kang

Caixin Kang

Towards Interactive Intelligence for Digital Humans

Add code
Dec 15, 2025
Viaarxiv icon

Living the Novel: A System for Generating Self-Training Timeline-Aware Conversational Agents from Novels

Add code
Dec 08, 2025
Viaarxiv icon

Can MLLMs Read the Room? A Multimodal Benchmark for Verifying Truthfulness in Multi-Party Social Interactions

Add code
Oct 31, 2025
Viaarxiv icon

From reactive to cognitive: brain-inspired spatial intelligence for embodied agents

Add code
Aug 24, 2025
Figure 1 for From reactive to cognitive: brain-inspired spatial intelligence for embodied agents
Figure 2 for From reactive to cognitive: brain-inspired spatial intelligence for embodied agents
Figure 3 for From reactive to cognitive: brain-inspired spatial intelligence for embodied agents
Figure 4 for From reactive to cognitive: brain-inspired spatial intelligence for embodied agents
Viaarxiv icon

Towards NSFW-Free Text-to-Image Generation via Safety-Constraint Direct Preference Optimization

Add code
Apr 19, 2025
Figure 1 for Towards NSFW-Free Text-to-Image Generation via Safety-Constraint Direct Preference Optimization
Figure 2 for Towards NSFW-Free Text-to-Image Generation via Safety-Constraint Direct Preference Optimization
Figure 3 for Towards NSFW-Free Text-to-Image Generation via Safety-Constraint Direct Preference Optimization
Figure 4 for Towards NSFW-Free Text-to-Image Generation via Safety-Constraint Direct Preference Optimization
Viaarxiv icon

Jailbreaking Multimodal Large Language Models via Shuffle Inconsistency

Add code
Jan 09, 2025
Figure 1 for Jailbreaking Multimodal Large Language Models via Shuffle Inconsistency
Figure 2 for Jailbreaking Multimodal Large Language Models via Shuffle Inconsistency
Figure 3 for Jailbreaking Multimodal Large Language Models via Shuffle Inconsistency
Figure 4 for Jailbreaking Multimodal Large Language Models via Shuffle Inconsistency
Viaarxiv icon

AdvDreamer Unveils: Are Vision-Language Models Truly Ready for Real-World 3D Variations?

Add code
Dec 04, 2024
Figure 1 for AdvDreamer Unveils: Are Vision-Language Models Truly Ready for Real-World 3D Variations?
Figure 2 for AdvDreamer Unveils: Are Vision-Language Models Truly Ready for Real-World 3D Variations?
Figure 3 for AdvDreamer Unveils: Are Vision-Language Models Truly Ready for Real-World 3D Variations?
Figure 4 for AdvDreamer Unveils: Are Vision-Language Models Truly Ready for Real-World 3D Variations?
Viaarxiv icon

OODFace: Benchmarking Robustness of Face Recognition under Common Corruptions and Appearance Variations

Add code
Dec 03, 2024
Figure 1 for OODFace: Benchmarking Robustness of Face Recognition under Common Corruptions and Appearance Variations
Figure 2 for OODFace: Benchmarking Robustness of Face Recognition under Common Corruptions and Appearance Variations
Figure 3 for OODFace: Benchmarking Robustness of Face Recognition under Common Corruptions and Appearance Variations
Figure 4 for OODFace: Benchmarking Robustness of Face Recognition under Common Corruptions and Appearance Variations
Viaarxiv icon

Real-world Adversarial Defense against Patch Attacks based on Diffusion Model

Add code
Sep 14, 2024
Viaarxiv icon

The RoboDrive Challenge: Drive Anytime Anywhere in Any Condition

Add code
May 14, 2024
Figure 1 for The RoboDrive Challenge: Drive Anytime Anywhere in Any Condition
Figure 2 for The RoboDrive Challenge: Drive Anytime Anywhere in Any Condition
Figure 3 for The RoboDrive Challenge: Drive Anytime Anywhere in Any Condition
Figure 4 for The RoboDrive Challenge: Drive Anytime Anywhere in Any Condition
Viaarxiv icon