Picture for Robinson Piramuthu

Robinson Piramuthu

S-EQA: Tackling Situational Queries in Embodied Question Answering

Add code
May 08, 2024
Viaarxiv icon

FlexEControl: Flexible and Efficient Multimodal Control for Text-to-Image Generation

Add code
May 08, 2024
Figure 1 for FlexEControl: Flexible and Efficient Multimodal Control for Text-to-Image Generation
Figure 2 for FlexEControl: Flexible and Efficient Multimodal Control for Text-to-Image Generation
Figure 3 for FlexEControl: Flexible and Efficient Multimodal Control for Text-to-Image Generation
Figure 4 for FlexEControl: Flexible and Efficient Multimodal Control for Text-to-Image Generation
Viaarxiv icon

"Don't forget to put the milk back!" Dataset for Enabling Embodied Agents to Detect Anomalous Situations

Add code
Apr 12, 2024
Viaarxiv icon

E-ViLM: Efficient Video-Language Model via Masked Video Modeling with Semantic Vector-Quantized Tokenizer

Add code
Nov 28, 2023
Viaarxiv icon

Characterizing Video Question Answering with Sparsified Inputs

Add code
Nov 27, 2023
Viaarxiv icon

Decision Making for Human-in-the-loop Robotic Agents via Uncertainty-Aware Reinforcement Learning

Add code
Mar 14, 2023
Figure 1 for Decision Making for Human-in-the-loop Robotic Agents via Uncertainty-Aware Reinforcement Learning
Figure 2 for Decision Making for Human-in-the-loop Robotic Agents via Uncertainty-Aware Reinforcement Learning
Figure 3 for Decision Making for Human-in-the-loop Robotic Agents via Uncertainty-Aware Reinforcement Learning
Figure 4 for Decision Making for Human-in-the-loop Robotic Agents via Uncertainty-Aware Reinforcement Learning
Viaarxiv icon

RREx-BoT: Remote Referring Expressions with a Bag of Tricks

Add code
Jan 30, 2023
Figure 1 for RREx-BoT: Remote Referring Expressions with a Bag of Tricks
Figure 2 for RREx-BoT: Remote Referring Expressions with a Bag of Tricks
Figure 3 for RREx-BoT: Remote Referring Expressions with a Bag of Tricks
Figure 4 for RREx-BoT: Remote Referring Expressions with a Bag of Tricks
Viaarxiv icon

CLIP-Nav: Using CLIP for Zero-Shot Vision-and-Language Navigation

Add code
Nov 30, 2022
Figure 1 for CLIP-Nav: Using CLIP for Zero-Shot Vision-and-Language Navigation
Figure 2 for CLIP-Nav: Using CLIP for Zero-Shot Vision-and-Language Navigation
Figure 3 for CLIP-Nav: Using CLIP for Zero-Shot Vision-and-Language Navigation
Figure 4 for CLIP-Nav: Using CLIP for Zero-Shot Vision-and-Language Navigation
Viaarxiv icon

Video in 10 Bits: Few-Bit VideoQA for Efficiency and Privacy

Add code
Oct 18, 2022
Figure 1 for Video in 10 Bits: Few-Bit VideoQA for Efficiency and Privacy
Figure 2 for Video in 10 Bits: Few-Bit VideoQA for Efficiency and Privacy
Figure 3 for Video in 10 Bits: Few-Bit VideoQA for Efficiency and Privacy
Figure 4 for Video in 10 Bits: Few-Bit VideoQA for Efficiency and Privacy
Viaarxiv icon

A Simple Approach for Visual Rearrangement: 3D Mapping and Semantic Search

Add code
Jun 21, 2022
Figure 1 for A Simple Approach for Visual Rearrangement: 3D Mapping and Semantic Search
Figure 2 for A Simple Approach for Visual Rearrangement: 3D Mapping and Semantic Search
Figure 3 for A Simple Approach for Visual Rearrangement: 3D Mapping and Semantic Search
Figure 4 for A Simple Approach for Visual Rearrangement: 3D Mapping and Semantic Search
Viaarxiv icon