Picture for Austin Waters

Austin Waters

Gemini: A Family of Highly Capable Multimodal Models

Add code
Dec 19, 2023
Viaarxiv icon

PaLI-X: On Scaling up a Multilingual Vision and Language Model

Add code
May 29, 2023
Figure 1 for PaLI-X: On Scaling up a Multilingual Vision and Language Model
Figure 2 for PaLI-X: On Scaling up a Multilingual Vision and Language Model
Figure 3 for PaLI-X: On Scaling up a Multilingual Vision and Language Model
Figure 4 for PaLI-X: On Scaling up a Multilingual Vision and Language Model
Viaarxiv icon

A New Path: Scaling Vision-and-Language Navigation with Synthetic Instructions and Imitation Learning

Add code
Oct 06, 2022
Figure 1 for A New Path: Scaling Vision-and-Language Navigation with Synthetic Instructions and Imitation Learning
Figure 2 for A New Path: Scaling Vision-and-Language Navigation with Synthetic Instructions and Imitation Learning
Figure 3 for A New Path: Scaling Vision-and-Language Navigation with Synthetic Instructions and Imitation Learning
Figure 4 for A New Path: Scaling Vision-and-Language Navigation with Synthetic Instructions and Imitation Learning
Viaarxiv icon

Simple and Effective Synthesis of Indoor 3D Scenes

Add code
Apr 06, 2022
Figure 1 for Simple and Effective Synthesis of Indoor 3D Scenes
Figure 2 for Simple and Effective Synthesis of Indoor 3D Scenes
Figure 3 for Simple and Effective Synthesis of Indoor 3D Scenes
Figure 4 for Simple and Effective Synthesis of Indoor 3D Scenes
Viaarxiv icon

Less is More: Generating Grounded Navigation Instructions from Landmarks

Add code
Nov 29, 2021
Figure 1 for Less is More: Generating Grounded Navigation Instructions from Landmarks
Figure 2 for Less is More: Generating Grounded Navigation Instructions from Landmarks
Figure 3 for Less is More: Generating Grounded Navigation Instructions from Landmarks
Figure 4 for Less is More: Generating Grounded Navigation Instructions from Landmarks
Viaarxiv icon

Talk, Don't Write: A Study of Direct Speech-Based Image Retrieval

Add code
Apr 08, 2021
Figure 1 for Talk, Don't Write: A Study of Direct Speech-Based Image Retrieval
Figure 2 for Talk, Don't Write: A Study of Direct Speech-Based Image Retrieval
Figure 3 for Talk, Don't Write: A Study of Direct Speech-Based Image Retrieval
Figure 4 for Talk, Don't Write: A Study of Direct Speech-Based Image Retrieval
Viaarxiv icon

Crisscrossed Captions: Extended Intramodal and Intermodal Semantic Similarity Judgments for MS-COCO

Add code
Apr 30, 2020
Figure 1 for Crisscrossed Captions: Extended Intramodal and Intermodal Semantic Similarity Judgments for MS-COCO
Figure 2 for Crisscrossed Captions: Extended Intramodal and Intermodal Semantic Similarity Judgments for MS-COCO
Figure 3 for Crisscrossed Captions: Extended Intramodal and Intermodal Semantic Similarity Judgments for MS-COCO
Figure 4 for Crisscrossed Captions: Extended Intramodal and Intermodal Semantic Similarity Judgments for MS-COCO
Viaarxiv icon

From Audio to Semantics: Approaches to end-to-end spoken language understanding

Add code
Sep 24, 2018
Figure 1 for From Audio to Semantics: Approaches to end-to-end spoken language understanding
Figure 2 for From Audio to Semantics: Approaches to end-to-end spoken language understanding
Figure 3 for From Audio to Semantics: Approaches to end-to-end spoken language understanding
Figure 4 for From Audio to Semantics: Approaches to end-to-end spoken language understanding
Viaarxiv icon