Picture for Kun He

Kun He

Glove2Hand: Synthesizing Natural Hand-Object Interaction from Multi-Modal Sensing Gloves

Add code
Mar 21, 2026
Viaarxiv icon

Separators in Enhancing Autoregressive Pretraining for Vision Mamba

Add code
Mar 04, 2026
Viaarxiv icon

ITO: Images and Texts as One via Synergizing Multiple Alignment and Training-Time Fusion

Add code
Mar 04, 2026
Viaarxiv icon

iGVLM: Dynamic Instruction-Guided Vision Encoding for Question-Aware Multimodal Understanding

Add code
Mar 03, 2026
Viaarxiv icon

KVSmooth: Mitigating Hallucination in Multi-modal Large Language Models through Key-Value Smoothing

Add code
Feb 04, 2026
Viaarxiv icon

Effective and Stealthy One-Shot Jailbreaks on Deployed Mobile Vision-Language Agents

Add code
Oct 09, 2025
Figure 1 for Effective and Stealthy One-Shot Jailbreaks on Deployed Mobile Vision-Language Agents
Figure 2 for Effective and Stealthy One-Shot Jailbreaks on Deployed Mobile Vision-Language Agents
Figure 3 for Effective and Stealthy One-Shot Jailbreaks on Deployed Mobile Vision-Language Agents
Figure 4 for Effective and Stealthy One-Shot Jailbreaks on Deployed Mobile Vision-Language Agents
Viaarxiv icon

ViT-EnsembleAttack: Augmenting Ensemble Models for Stronger Adversarial Transferability in Vision Transformers

Add code
Aug 17, 2025
Viaarxiv icon

VisCRA: A Visual Chain Reasoning Attack for Jailbreaking Multimodal Large Language Models

Add code
May 26, 2025
Viaarxiv icon

DAM-GT: Dual Positional Encoding-Based Attention Masking Graph Transformer for Node Classification

Add code
May 23, 2025
Viaarxiv icon

Bandit based Dynamic Candidate Edge Selection in Solving Traveling Salesman Problems

Add code
May 21, 2025
Viaarxiv icon