Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Multimodal Sequential Generative Models for Semi-Supervised Language Instruction Following

Dec 29, 2022

Kei Akuzawa, Yusuke Iwasawa, Yutaka Matsuo

Figure 1 for Multimodal Sequential Generative Models for Semi-Supervised Language Instruction Following

Figure 2 for Multimodal Sequential Generative Models for Semi-Supervised Language Instruction Following

Figure 3 for Multimodal Sequential Generative Models for Semi-Supervised Language Instruction Following

Figure 4 for Multimodal Sequential Generative Models for Semi-Supervised Language Instruction Following

Share this with someone who'll enjoy it:

Abstract:Agents that can follow language instructions are expected to be useful in a variety of situations such as navigation. However, training neural network-based agents requires numerous paired trajectories and languages. This paper proposes using multimodal generative models for semi-supervised learning in the instruction following tasks. The models learn a shared representation of the paired data, and enable semi-supervised learning by reconstructing unpaired data through the representation. Key challenges in applying the models to sequence-to-sequence tasks including instruction following are learning a shared representation of variable-length mulitimodal data and incorporating attention mechanisms. To address the problems, this paper proposes a novel network architecture to absorb the difference in the sequence lengths of the multimodal data. In addition, to further improve the performance, this paper shows how to incorporate the generative model-based approach with an existing semi-supervised method called a speaker-follower model, and proposes a regularization term that improves inference using unpaired trajectories. Experiments on BabyAI and Room-to-Room (R2R) environments show that the proposed method improves the performance of instruction following by leveraging unpaired data, and improves the performance of the speaker-follower model by 2\% to 4\% in R2R.

View paper on

Share this with someone who'll enjoy it:

Title:Multimodal Sequential Generative Models for Semi-Supervised Language Instruction Following

Paper and Code