Picture for Lihang Pan

Lihang Pan

WebVIA: A Web-based Vision-Language Agentic Framework for Interactive and Verifiable UI-to-Code Generation

Add code
Nov 09, 2025
Viaarxiv icon

GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning

Add code
Jul 02, 2025
Figure 1 for GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning
Figure 2 for GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning
Figure 3 for GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning
Figure 4 for GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning
Viaarxiv icon

VisualAgentBench: Towards Large Multimodal Models as Visual Foundation Agents

Add code
Aug 12, 2024
Figure 1 for VisualAgentBench: Towards Large Multimodal Models as Visual Foundation Agents
Figure 2 for VisualAgentBench: Towards Large Multimodal Models as Visual Foundation Agents
Figure 3 for VisualAgentBench: Towards Large Multimodal Models as Visual Foundation Agents
Figure 4 for VisualAgentBench: Towards Large Multimodal Models as Visual Foundation Agents
Viaarxiv icon