Picture for Kehai Chen

Kehai Chen

Decoupling Skeleton and Flesh: Efficient Multimodal Table Reasoning with Disentangled Alignment and Structure-aware Guidance

Add code
Feb 03, 2026
Viaarxiv icon

VC-Bench: Pioneering the Video Connecting Benchmark with a Dataset and Evaluation Metrics

Add code
Jan 27, 2026
Viaarxiv icon

Beyond Rigid: Benchmarking Non-Rigid Video Editing

Add code
Jan 26, 2026
Viaarxiv icon

Character-R1: Enhancing Role-Aware Reasoning in Role-Playing Agents via RLVR

Add code
Jan 08, 2026
Viaarxiv icon

From Perception to Reasoning: Deep Thinking Empowers Multimodal Large Language Models

Add code
Nov 18, 2025
Viaarxiv icon

LoCoT2V-Bench: A Benchmark for Long-Form and Complex Text-to-Video Generation

Add code
Oct 30, 2025
Viaarxiv icon

From Bias to Balance: Exploring and Mitigating Spatial Bias in LVLMs

Add code
Sep 26, 2025
Figure 1 for From Bias to Balance: Exploring and Mitigating Spatial Bias in LVLMs
Figure 2 for From Bias to Balance: Exploring and Mitigating Spatial Bias in LVLMs
Figure 3 for From Bias to Balance: Exploring and Mitigating Spatial Bias in LVLMs
Figure 4 for From Bias to Balance: Exploring and Mitigating Spatial Bias in LVLMs
Viaarxiv icon

Evaluating and Steering Modality Preferences in Multimodal Large Language Model

Add code
May 27, 2025
Viaarxiv icon

XBOUND: Exploring the Capability Boundaries of Device-Control Agents through Trajectory Tree Exploration

Add code
May 27, 2025
Viaarxiv icon

MDIT-Bench: Evaluating the Dual-Implicit Toxicity in Large Multimodal Models

Add code
May 22, 2025
Viaarxiv icon