Picture for Jing Yu Koh

Jing Yu Koh

Tree Search for Language Model Agents

Add code
Jul 01, 2024
Viaarxiv icon

Adversarial Attacks on Multimodal Agents

Add code
Jun 18, 2024
Viaarxiv icon

OmniACT: A Dataset and Benchmark for Enabling Multimodal Generalist Autonomous Agents for Desktop and Web

Add code
Feb 28, 2024
Viaarxiv icon

VisualWebArena: Evaluating Multimodal Agents on Realistic Visual Web Tasks

Add code
Jan 24, 2024
Viaarxiv icon

Multimodal Graph Learning for Generative Tasks

Add code
Oct 12, 2023
Viaarxiv icon

Generating Images with Multimodal Language Models

Add code
May 26, 2023
Viaarxiv icon

VQ3D: Learning a 3D-Aware Generative Model on ImageNet

Add code
Feb 14, 2023
Viaarxiv icon

Grounding Language Models to Images for Multimodal Generation

Add code
Jan 31, 2023
Viaarxiv icon

A New Path: Scaling Vision-and-Language Navigation with Synthetic Instructions and Imitation Learning

Add code
Oct 06, 2022
Figure 1 for A New Path: Scaling Vision-and-Language Navigation with Synthetic Instructions and Imitation Learning
Figure 2 for A New Path: Scaling Vision-and-Language Navigation with Synthetic Instructions and Imitation Learning
Figure 3 for A New Path: Scaling Vision-and-Language Navigation with Synthetic Instructions and Imitation Learning
Figure 4 for A New Path: Scaling Vision-and-Language Navigation with Synthetic Instructions and Imitation Learning
Viaarxiv icon

Scaling Autoregressive Models for Content-Rich Text-to-Image Generation

Add code
Jun 22, 2022
Figure 1 for Scaling Autoregressive Models for Content-Rich Text-to-Image Generation
Figure 2 for Scaling Autoregressive Models for Content-Rich Text-to-Image Generation
Figure 3 for Scaling Autoregressive Models for Content-Rich Text-to-Image Generation
Figure 4 for Scaling Autoregressive Models for Content-Rich Text-to-Image Generation
Viaarxiv icon