Picture for Kaipeng Zhang

Kaipeng Zhang

Focal Guidance: Unlocking Controllability from Semantic-Weak Layers in Video Diffusion Models

Add code
Jan 12, 2026
Viaarxiv icon

ProSoftArena: Benchmarking Hierarchical Capabilities of Multimodal Agents in Professional Software Environments

Add code
Dec 30, 2025
Viaarxiv icon

Yume-1.5: A Text-Controlled Interactive World Generation Model

Add code
Dec 26, 2025
Viaarxiv icon

SVBench: Evaluation of Video Generation Models on Social Reasoning

Add code
Dec 25, 2025
Viaarxiv icon

Code-in-the-Loop Forensics: Agentic Tool Use for Image Forgery Detection

Add code
Dec 18, 2025
Figure 1 for Code-in-the-Loop Forensics: Agentic Tool Use for Image Forgery Detection
Figure 2 for Code-in-the-Loop Forensics: Agentic Tool Use for Image Forgery Detection
Figure 3 for Code-in-the-Loop Forensics: Agentic Tool Use for Image Forgery Detection
Figure 4 for Code-in-the-Loop Forensics: Agentic Tool Use for Image Forgery Detection
Viaarxiv icon

Dialogue as Discovery: Navigating Human Intent Through Principled Inquiry

Add code
Oct 31, 2025
Viaarxiv icon

From Pixels to Paths: A Multi-Agent Framework for Editable Scientific Illustration

Add code
Oct 31, 2025
Viaarxiv icon

Symbolic Graphics Programming with Large Language Models

Add code
Sep 05, 2025
Figure 1 for Symbolic Graphics Programming with Large Language Models
Figure 2 for Symbolic Graphics Programming with Large Language Models
Figure 3 for Symbolic Graphics Programming with Large Language Models
Figure 4 for Symbolic Graphics Programming with Large Language Models
Viaarxiv icon

InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency

Add code
Aug 25, 2025
Figure 1 for InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency
Figure 2 for InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency
Figure 3 for InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency
Figure 4 for InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency
Viaarxiv icon

MDK12-Bench: A Comprehensive Evaluation of Multimodal Large Language Models on Multidisciplinary Exams

Add code
Aug 09, 2025
Viaarxiv icon