Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Cross-Lingual Vision-Language Navigation

Oct 24, 2019

An Yan, Xin Wang, Jiangtao Feng, Lei Li, William Yang Wang

Figure 1 for Cross-Lingual Vision-Language Navigation

Figure 2 for Cross-Lingual Vision-Language Navigation

Figure 3 for Cross-Lingual Vision-Language Navigation

Figure 4 for Cross-Lingual Vision-Language Navigation

Share this with someone who'll enjoy it:

Abstract:Vision-Language Navigation (VLN) is the task where an agent is commanded to navigate in photo-realistic environments with natural language instructions. Previous research on VLN is primarily conducted on the Room-to-Room (R2R) dataset with only English instructions. The ultimate goal of VLN, however, is to serve people speaking arbitrary languages. To do this, we collect a cross-lingual R2R dataset, extending the original benchmark with corresponding Chinese instructions. But it is impractical to collect human-annotated instructions for every existing language. Based on the newly introduced dataset, we propose a general cross-lingual VLN framework to enable instruction-following navigation for different languages. We first explore the possibility of building a cross-lingual agent when no training data of the target language is available. The cross-lingual agent is equipped with a meta-learner to aggregate cross-lingual representations and with a visually grounded cross-lingual alignment module to align textual representations of different languages. Under the zero-shot learning scenario, our model shows competitive results even compared to a model trained with all target language instructions. Besides, we introduce an adversarial domain adaption loss to improve the transferring ability of our model when given a certain amount of target language data. Our dataset and methods demonstrate potentials of building scalable cross-lingual agents to serve speakers with different languages.

* Tech report. First two authors contributed equally

View paper on

OpenReview

Share this with someone who'll enjoy it:

Title:Cross-Lingual Vision-Language Navigation

Paper and Code