Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Jinglong Ji

ScreenExplorer: Training a Vision-Language Model for Diverse Exploration in Open GUI World

May 25, 2025

Runliang Niu, Jinglong Ji, Yi Chang, Qi Wang

Figure 1 for ScreenExplorer: Training a Vision-Language Model for Diverse Exploration in Open GUI World

Figure 2 for ScreenExplorer: Training a Vision-Language Model for Diverse Exploration in Open GUI World

Figure 3 for ScreenExplorer: Training a Vision-Language Model for Diverse Exploration in Open GUI World

Figure 4 for ScreenExplorer: Training a Vision-Language Model for Diverse Exploration in Open GUI World

Abstract:The rapid progress of large language models (LLMs) has sparked growing interest in building Artificial General Intelligence (AGI) within Graphical User Interface (GUI) environments. However, existing GUI agents based on LLMs or vision-language models (VLMs) often fail to generalize to novel environments and rely heavily on manually curated, diverse datasets. To overcome these limitations, we introduce ScreenExplorer, a VLM trained via Group Relative Policy Optimization(GRPO) in real, dynamic, and open-ended GUI environments. Innovatively, we introduced a world-model-based curiosity reward function to help the agent overcome the cold-start phase of exploration. Additionally, distilling experience streams further enhances the model's exploration capabilities. Our training framework enhances model exploration in open GUI environments, with trained models showing better environmental adaptation and sustained exploration compared to static deployment models. Our findings offer a scalable pathway toward AGI systems with self-improving capabilities in complex interactive settings.

Via

Access Paper or Ask Questions

ADA-GNN: Atom-Distance-Angle Graph Neural Network for Crystal Material Property Prediction

Jan 22, 2024

Jiao Huang, Qianli Xing, Jinglong Ji, Bo Yang

Abstract:Property prediction is a fundamental task in crystal material research. To model atoms and structures, structures represented as graphs are widely used and graph learning-based methods have achieved significant progress. Bond angles and bond distances are two key structural information that greatly influence crystal properties. However, most of the existing works only consider bond distances and overlook bond angles. The main challenge lies in the time cost of handling bond angles, which leads to a significant increase in inference time. To solve this issue, we first propose a crystal structure modeling based on dual scale neighbor partitioning mechanism, which uses a larger scale cutoff for edge neighbors and a smaller scale cutoff for angle neighbors. Then, we propose a novel Atom-Distance-Angle Graph Neural Network (ADA-GNN) for property prediction tasks, which can process node information and structural information separately. The accuracy of predictions and inference time are improved with the dual scale modeling and the specially designed architecture of ADA-GNN. The experimental results validate that our approach achieves state-of-the-art results in two large-scale material benchmark datasets on property prediction tasks.

Via

Access Paper or Ask Questions