Alert button
Picture for Jiasen Lu

Jiasen Lu

Alert button

Unified-IO 2: Scaling Autoregressive Multimodal Models with Vision, Language, Audio, and Action

Add code
Bookmark button
Alert button
Dec 28, 2023
Jiasen Lu, Christopher Clark, Sangho Lee, Zichen Zhang, Savya Khosla, Ryan Marten, Derek Hoiem, Aniruddha Kembhavi

Viaarxiv icon

Unified-IO: A Unified Model for Vision, Language, and Multi-Modal Tasks

Add code
Bookmark button
Alert button
Jun 17, 2022
Jiasen Lu, Christopher Clark, Rowan Zellers, Roozbeh Mottaghi, Aniruddha Kembhavi

Figure 1 for Unified-IO: A Unified Model for Vision, Language, and Multi-Modal Tasks
Figure 2 for Unified-IO: A Unified Model for Vision, Language, and Multi-Modal Tasks
Figure 3 for Unified-IO: A Unified Model for Vision, Language, and Multi-Modal Tasks
Figure 4 for Unified-IO: A Unified Model for Vision, Language, and Multi-Modal Tasks
Viaarxiv icon

ASC me to Do Anything: Multi-task Training for Embodied AI

Add code
Bookmark button
Alert button
Feb 14, 2022
Jiasen Lu, Jordi Salvador, Roozbeh Mottaghi, Aniruddha Kembhavi

Figure 1 for ASC me to Do Anything: Multi-task Training for Embodied AI
Figure 2 for ASC me to Do Anything: Multi-task Training for Embodied AI
Figure 3 for ASC me to Do Anything: Multi-task Training for Embodied AI
Figure 4 for ASC me to Do Anything: Multi-task Training for Embodied AI
Viaarxiv icon

MERLOT Reserve: Neural Script Knowledge through Vision and Language and Sound

Add code
Bookmark button
Alert button
Jan 07, 2022
Rowan Zellers, Jiasen Lu, Ximing Lu, Youngjae Yu, Yanpeng Zhao, Mohammadreza Salehi, Aditya Kusupati, Jack Hessel, Ali Farhadi, Yejin Choi

Figure 1 for MERLOT Reserve: Neural Script Knowledge through Vision and Language and Sound
Figure 2 for MERLOT Reserve: Neural Script Knowledge through Vision and Language and Sound
Figure 3 for MERLOT Reserve: Neural Script Knowledge through Vision and Language and Sound
Figure 4 for MERLOT Reserve: Neural Script Knowledge through Vision and Language and Sound
Viaarxiv icon

A Simple Long-Tailed Recognition Baseline via Vision-Language Model

Add code
Bookmark button
Alert button
Nov 29, 2021
Teli Ma, Shijie Geng, Mengmeng Wang, Jing Shao, Jiasen Lu, Hongsheng Li, Peng Gao, Yu Qiao

Figure 1 for A Simple Long-Tailed Recognition Baseline via Vision-Language Model
Figure 2 for A Simple Long-Tailed Recognition Baseline via Vision-Language Model
Figure 3 for A Simple Long-Tailed Recognition Baseline via Vision-Language Model
Figure 4 for A Simple Long-Tailed Recognition Baseline via Vision-Language Model
Viaarxiv icon

Container: Context Aggregation Network

Add code
Bookmark button
Alert button
Jun 02, 2021
Peng Gao, Jiasen Lu, Hongsheng Li, Roozbeh Mottaghi, Aniruddha Kembhavi

Figure 1 for Container: Context Aggregation Network
Figure 2 for Container: Context Aggregation Network
Figure 3 for Container: Context Aggregation Network
Figure 4 for Container: Context Aggregation Network
Viaarxiv icon

Multi-Modal Answer Validation for Knowledge-Based VQA

Add code
Bookmark button
Alert button
Mar 23, 2021
Jialin Wu, Jiasen Lu, Ashish Sabharwal, Roozbeh Mottaghi

Figure 1 for Multi-Modal Answer Validation for Knowledge-Based VQA
Figure 2 for Multi-Modal Answer Validation for Knowledge-Based VQA
Figure 3 for Multi-Modal Answer Validation for Knowledge-Based VQA
Figure 4 for Multi-Modal Answer Validation for Knowledge-Based VQA
Viaarxiv icon

X-LXMERT: Paint, Caption and Answer Questions with Multi-Modal Transformers

Add code
Bookmark button
Alert button
Sep 23, 2020
Jaemin Cho, Jiasen Lu, Dustin Schwenk, Hannaneh Hajishirzi, Aniruddha Kembhavi

Figure 1 for X-LXMERT: Paint, Caption and Answer Questions with Multi-Modal Transformers
Figure 2 for X-LXMERT: Paint, Caption and Answer Questions with Multi-Modal Transformers
Figure 3 for X-LXMERT: Paint, Caption and Answer Questions with Multi-Modal Transformers
Figure 4 for X-LXMERT: Paint, Caption and Answer Questions with Multi-Modal Transformers
Viaarxiv icon

Dialog without Dialog Data: Learning Visual Dialog Agents from VQA Data

Add code
Bookmark button
Alert button
Jul 24, 2020
Michael Cogswell, Jiasen Lu, Rishabh Jain, Stefan Lee, Devi Parikh, Dhruv Batra

Figure 1 for Dialog without Dialog Data: Learning Visual Dialog Agents from VQA Data
Figure 2 for Dialog without Dialog Data: Learning Visual Dialog Agents from VQA Data
Figure 3 for Dialog without Dialog Data: Learning Visual Dialog Agents from VQA Data
Figure 4 for Dialog without Dialog Data: Learning Visual Dialog Agents from VQA Data
Viaarxiv icon

Spatially Aware Multimodal Transformers for TextVQA

Add code
Bookmark button
Alert button
Jul 23, 2020
Yash Kant, Dhruv Batra, Peter Anderson, Alex Schwing, Devi Parikh, Jiasen Lu, Harsh Agrawal

Figure 1 for Spatially Aware Multimodal Transformers for TextVQA
Figure 2 for Spatially Aware Multimodal Transformers for TextVQA
Figure 3 for Spatially Aware Multimodal Transformers for TextVQA
Figure 4 for Spatially Aware Multimodal Transformers for TextVQA
Viaarxiv icon