Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Mapping Natural Language Instructions to Mobile UI Action Sequences

Jun 05, 2020

Yang Li, Jiacong He, Xin Zhou, Yuan Zhang, Jason Baldridge

Figure 1 for Mapping Natural Language Instructions to Mobile UI Action Sequences

Figure 2 for Mapping Natural Language Instructions to Mobile UI Action Sequences

Figure 3 for Mapping Natural Language Instructions to Mobile UI Action Sequences

Figure 4 for Mapping Natural Language Instructions to Mobile UI Action Sequences

Share this with someone who'll enjoy it:

Abstract:We present a new problem: grounding natural language instructions to mobile user interface actions, and create three new datasets for it. For full task evaluation, we create PIXELHELP, a corpus that pairs English instructions with actions performed by people on a mobile UI emulator. To scale training, we decouple the language and action data by (a) annotating action phrase spans in HowTo instructions and (b) synthesizing grounded descriptions of actions for mobile user interfaces. We use a Transformer to extract action phrase tuples from long-range natural language instructions. A grounding Transformer then contextually represents UI objects using both their content and screen position and connects them to object descriptions. Given a starting screen and instruction, our model achieves 70.59% accuracy on predicting complete ground-truth action sequences in PIXELHELP.

* Annual Conference of the Association for Computational Linguistics (ACL 2020)

View paper on

Share this with someone who'll enjoy it:

Title:Mapping Natural Language Instructions to Mobile UI Action Sequences

Paper and Code