Picture for Hanming Deng

Hanming Deng

From Pixels to Words -- Towards Native One-Vision Models at Scale

Add code
May 27, 2026
Viaarxiv icon

InterSketch: An Interleaved Reasoning Model with Self-correcting Visual Sketch and Stepwise Reward

Add code
May 26, 2026
Viaarxiv icon

SenseNova-U1: Unifying Multimodal Understanding and Generation with NEO-unify Architecture

Add code
May 12, 2026
Viaarxiv icon

V-ABS: Action-Observer Driven Beam Search for Dynamic Visual Reasoning

Add code
May 11, 2026
Viaarxiv icon

EVA: Efficient Reinforcement Learning for End-to-End Video Agent

Add code
Mar 24, 2026
Viaarxiv icon

ACPO: Counteracting Likelihood Displacement in Vision-Language Alignment with Asymmetric Constraints

Add code
Mar 23, 2026
Viaarxiv icon

SenseNova-MARS: Empowering Multimodal Agentic Reasoning and Search via Reinforcement Learning

Add code
Dec 30, 2025
Viaarxiv icon

Scaling Spatial Intelligence with Multimodal Foundation Models

Add code
Nov 17, 2025
Figure 1 for Scaling Spatial Intelligence with Multimodal Foundation Models
Figure 2 for Scaling Spatial Intelligence with Multimodal Foundation Models
Figure 3 for Scaling Spatial Intelligence with Multimodal Foundation Models
Figure 4 for Scaling Spatial Intelligence with Multimodal Foundation Models
Viaarxiv icon

From Pixels to Words -- Towards Native Vision-Language Primitives at Scale

Add code
Oct 16, 2025
Viaarxiv icon

Has GPT-5 Achieved Spatial Intelligence? An Empirical Study

Add code
Aug 18, 2025
Figure 1 for Has GPT-5 Achieved Spatial Intelligence? An Empirical Study
Figure 2 for Has GPT-5 Achieved Spatial Intelligence? An Empirical Study
Figure 3 for Has GPT-5 Achieved Spatial Intelligence? An Empirical Study
Figure 4 for Has GPT-5 Achieved Spatial Intelligence? An Empirical Study
Viaarxiv icon