Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:SWIRL: A Staged Workflow for Interleaved Reinforcement Learning in Mobile GUI Control

Aug 27, 2025

Quanfeng Lu, Zhantao Ma, Shuai Zhong, Jin Wang, Dahai Yu, Michael K. Ng, Ping Luo

Figure 1 for SWIRL: A Staged Workflow for Interleaved Reinforcement Learning in Mobile GUI Control

Figure 2 for SWIRL: A Staged Workflow for Interleaved Reinforcement Learning in Mobile GUI Control

Figure 3 for SWIRL: A Staged Workflow for Interleaved Reinforcement Learning in Mobile GUI Control

Figure 4 for SWIRL: A Staged Workflow for Interleaved Reinforcement Learning in Mobile GUI Control

Share this with someone who'll enjoy it:

Abstract:The rapid advancement of large vision language models (LVLMs) and agent systems has heightened interest in mobile GUI agents that can reliably translate natural language into interface operations. Existing single-agent approaches, however, remain limited by structural constraints. Although multi-agent systems naturally decouple different competencies, recent progress in multi-agent reinforcement learning (MARL) has often been hindered by inefficiency and remains incompatible with current LVLM architectures. To address these challenges, we introduce SWIRL, a staged workflow for interleaved reinforcement learning designed for multi-agent systems. SWIRL reformulates MARL into a sequence of single-agent reinforcement learning tasks, updating one agent at a time while keeping the others fixed. This formulation enables stable training and promotes efficient coordination across agents. Theoretically, we provide a stepwise safety bound, a cross-round monotonic improvement theorem, and convergence guarantees on return, ensuring robust and principled optimization. In application to mobile GUI control, SWIRL instantiates a Navigator that converts language and screen context into structured plans, and an Interactor that grounds these plans into executable atomic actions. Extensive experiments demonstrate superior performance on both high-level and low-level GUI benchmarks. Beyond GUI tasks, SWIRL also demonstrates strong capability in multi-agent mathematical reasoning, underscoring its potential as a general framework for developing efficient and robust multi-agent systems.

* 28 pages, 12 figures

View paper on

Share this with someone who'll enjoy it:

Title:SWIRL: A Staged Workflow for Interleaved Reinforcement Learning in Mobile GUI Control

Paper and Code