Picture for Harsh Agrawal

Harsh Agrawal

Grounding Multimodal Large Language Models in Actions

Add code
Jun 12, 2024
Viaarxiv icon

Large Language Models as Generalizable Policies for Embodied Tasks

Add code
Oct 26, 2023
Viaarxiv icon

Housekeep: Tidying Virtual Households using Commonsense Reasoning

Add code
May 22, 2022
Figure 1 for Housekeep: Tidying Virtual Households using Commonsense Reasoning
Figure 2 for Housekeep: Tidying Virtual Households using Commonsense Reasoning
Figure 3 for Housekeep: Tidying Virtual Households using Commonsense Reasoning
Figure 4 for Housekeep: Tidying Virtual Households using Commonsense Reasoning
Viaarxiv icon

Simple and Effective Synthesis of Indoor 3D Scenes

Add code
Apr 06, 2022
Figure 1 for Simple and Effective Synthesis of Indoor 3D Scenes
Figure 2 for Simple and Effective Synthesis of Indoor 3D Scenes
Figure 3 for Simple and Effective Synthesis of Indoor 3D Scenes
Figure 4 for Simple and Effective Synthesis of Indoor 3D Scenes
Viaarxiv icon

SOAT: A Scene- and Object-Aware Transformer for Vision-and-Language Navigation

Add code
Oct 27, 2021
Figure 1 for SOAT: A Scene- and Object-Aware Transformer for Vision-and-Language Navigation
Figure 2 for SOAT: A Scene- and Object-Aware Transformer for Vision-and-Language Navigation
Figure 3 for SOAT: A Scene- and Object-Aware Transformer for Vision-and-Language Navigation
Figure 4 for SOAT: A Scene- and Object-Aware Transformer for Vision-and-Language Navigation
Viaarxiv icon

The Surprising Effectiveness of Visual Odometry Techniques for Embodied PointGoal Navigation

Add code
Aug 26, 2021
Figure 1 for The Surprising Effectiveness of Visual Odometry Techniques for Embodied PointGoal Navigation
Figure 2 for The Surprising Effectiveness of Visual Odometry Techniques for Embodied PointGoal Navigation
Figure 3 for The Surprising Effectiveness of Visual Odometry Techniques for Embodied PointGoal Navigation
Figure 4 for The Surprising Effectiveness of Visual Odometry Techniques for Embodied PointGoal Navigation
Viaarxiv icon

Contrast and Classify: Alternate Training for Robust VQA

Add code
Oct 13, 2020
Figure 1 for Contrast and Classify: Alternate Training for Robust VQA
Figure 2 for Contrast and Classify: Alternate Training for Robust VQA
Figure 3 for Contrast and Classify: Alternate Training for Robust VQA
Figure 4 for Contrast and Classify: Alternate Training for Robust VQA
Viaarxiv icon

Spatially Aware Multimodal Transformers for TextVQA

Add code
Jul 23, 2020
Figure 1 for Spatially Aware Multimodal Transformers for TextVQA
Figure 2 for Spatially Aware Multimodal Transformers for TextVQA
Figure 3 for Spatially Aware Multimodal Transformers for TextVQA
Figure 4 for Spatially Aware Multimodal Transformers for TextVQA
Viaarxiv icon

Sequential Latent Spaces for Modeling the Intention During Diverse Image Captioning

Add code
Aug 22, 2019
Figure 1 for Sequential Latent Spaces for Modeling the Intention During Diverse Image Captioning
Figure 2 for Sequential Latent Spaces for Modeling the Intention During Diverse Image Captioning
Figure 3 for Sequential Latent Spaces for Modeling the Intention During Diverse Image Captioning
Figure 4 for Sequential Latent Spaces for Modeling the Intention During Diverse Image Captioning
Viaarxiv icon

EvalAI: Towards Better Evaluation Systems for AI Agents

Add code
Feb 10, 2019
Figure 1 for EvalAI: Towards Better Evaluation Systems for AI Agents
Figure 2 for EvalAI: Towards Better Evaluation Systems for AI Agents
Figure 3 for EvalAI: Towards Better Evaluation Systems for AI Agents
Figure 4 for EvalAI: Towards Better Evaluation Systems for AI Agents
Viaarxiv icon