Picture for Jiani Zheng

Jiani Zheng

Grasp Any Region: Towards Precise, Contextual Pixel Understanding for Multimodal LLMs

Add code
Oct 22, 2025
Viaarxiv icon

Traceable Evidence Enhanced Visual Grounded Reasoning: Evaluation and Methodology

Add code
Jul 10, 2025
Viaarxiv icon

VEM: Environment-Free Exploration for Training GUI Agent with Value Environment Model

Add code
Feb 26, 2025
Figure 1 for VEM: Environment-Free Exploration for Training GUI Agent with Value Environment Model
Figure 2 for VEM: Environment-Free Exploration for Training GUI Agent with Value Environment Model
Figure 3 for VEM: Environment-Free Exploration for Training GUI Agent with Value Environment Model
Figure 4 for VEM: Environment-Free Exploration for Training GUI Agent with Value Environment Model
Viaarxiv icon

Make Pixels Dance: High-Dynamic Video Generation

Add code
Nov 18, 2023
Viaarxiv icon

What Matters in Training a GPT4-Style Language Model with Multimodal Inputs?

Add code
Jul 30, 2023
Viaarxiv icon