Picture for Hongjie Zhang

Hongjie Zhang

ScaleEdit-12M: Scaling Open-Source Image Editing Data Generation via Multi-Agent Framework

Add code
Mar 21, 2026
Viaarxiv icon

Reliable Reasoning in SVG-LLMs via Multi-Task Multi-Reward Reinforcement Learning

Add code
Mar 17, 2026
Viaarxiv icon

InternVL-U: Democratizing Unified Multimodal Models for Understanding, Reasoning, Generation and Editing

Add code
Mar 10, 2026
Viaarxiv icon

Holi-Spatial: Evolving Video Streams into Holistic 3D Spatial Intelligence

Add code
Mar 08, 2026
Viaarxiv icon

Video-o3: Native Interleaved Clue Seeking for Long Video Multi-Hop Reasoning

Add code
Jan 30, 2026
Viaarxiv icon

MTOS: A LLM-Driven Multi-topic Opinion Simulation Framework for Exploring Echo Chamber Dynamics

Add code
Oct 14, 2025
Figure 1 for MTOS: A LLM-Driven Multi-topic Opinion Simulation Framework for Exploring Echo Chamber Dynamics
Figure 2 for MTOS: A LLM-Driven Multi-topic Opinion Simulation Framework for Exploring Echo Chamber Dynamics
Figure 3 for MTOS: A LLM-Driven Multi-topic Opinion Simulation Framework for Exploring Echo Chamber Dynamics
Figure 4 for MTOS: A LLM-Driven Multi-topic Opinion Simulation Framework for Exploring Echo Chamber Dynamics
Viaarxiv icon

InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency

Add code
Aug 25, 2025
Figure 1 for InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency
Figure 2 for InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency
Figure 3 for InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency
Figure 4 for InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency
Viaarxiv icon

Point or Line? Using Line-based Representation for Panoptic Symbol Spotting in CAD Drawings

Add code
May 29, 2025
Viaarxiv icon

Weakly Supervised Temporal Sentence Grounding via Positive Sample Mining

Add code
May 10, 2025
Viaarxiv icon

InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models

Add code
Apr 15, 2025
Viaarxiv icon