Alert button
Picture for Mandar Joshi

Mandar Joshi

Alert button

BAGEL: Bootstrapping Agents by Guiding Exploration with Language

Mar 12, 2024
Shikhar Murty, Christopher Manning, Peter Shaw, Mandar Joshi, Kenton Lee

Viaarxiv icon

Efficient End-to-End Visual Document Understanding with Rationale Distillation

Nov 16, 2023
Wang Zhu, Alekh Agarwal, Mandar Joshi, Robin Jia, Jesse Thomason, Kristina Toutanova

Viaarxiv icon

From Pixels to UI Actions: Learning to Follow Instructions via Graphical User Interfaces

May 31, 2023
Peter Shaw, Mandar Joshi, James Cohan, Jonathan Berant, Panupong Pasupat, Hexiang Hu, Urvashi Khandelwal, Kenton Lee, Kristina Toutanova

Figure 1 for From Pixels to UI Actions: Learning to Follow Instructions via Graphical User Interfaces
Figure 2 for From Pixels to UI Actions: Learning to Follow Instructions via Graphical User Interfaces
Figure 3 for From Pixels to UI Actions: Learning to Follow Instructions via Graphical User Interfaces
Figure 4 for From Pixels to UI Actions: Learning to Follow Instructions via Graphical User Interfaces
Viaarxiv icon

PaLI-X: On Scaling up a Multilingual Vision and Language Model

May 29, 2023
Xi Chen, Josip Djolonga, Piotr Padlewski, Basil Mustafa, Soravit Changpinyo, Jialin Wu, Carlos Riquelme Ruiz, Sebastian Goodman, Xiao Wang, Yi Tay, Siamak Shakeri, Mostafa Dehghani, Daniel Salz, Mario Lucic, Michael Tschannen, Arsha Nagrani, Hexiang Hu, Mandar Joshi, Bo Pang, Ceslee Montgomery, Paulina Pietrzyk, Marvin Ritter, AJ Piergiovanni, Matthias Minderer, Filip Pavetic, Austin Waters, Gang Li, Ibrahim Alabdulmohsin, Lucas Beyer, Julien Amelot, Kenton Lee, Andreas Peter Steiner, Yang Li, Daniel Keysers, Anurag Arnab, Yuanzhong Xu, Keran Rong, Alexander Kolesnikov, Mojtaba Seyedhosseini, Anelia Angelova, Xiaohua Zhai, Neil Houlsby, Radu Soricut

Figure 1 for PaLI-X: On Scaling up a Multilingual Vision and Language Model
Figure 2 for PaLI-X: On Scaling up a Multilingual Vision and Language Model
Figure 3 for PaLI-X: On Scaling up a Multilingual Vision and Language Model
Figure 4 for PaLI-X: On Scaling up a Multilingual Vision and Language Model
Viaarxiv icon

Open-domain Visual Entity Recognition: Towards Recognizing Millions of Wikipedia Entities

Feb 24, 2023
Hexiang Hu, Yi Luan, Yang Chen, Urvashi Khandelwal, Mandar Joshi, Kenton Lee, Kristina Toutanova, Ming-Wei Chang

Figure 1 for Open-domain Visual Entity Recognition: Towards Recognizing Millions of Wikipedia Entities
Figure 2 for Open-domain Visual Entity Recognition: Towards Recognizing Millions of Wikipedia Entities
Figure 3 for Open-domain Visual Entity Recognition: Towards Recognizing Millions of Wikipedia Entities
Figure 4 for Open-domain Visual Entity Recognition: Towards Recognizing Millions of Wikipedia Entities
Viaarxiv icon

DePlot: One-shot visual language reasoning by plot-to-table translation

Dec 20, 2022
Fangyu Liu, Julian Martin Eisenschlos, Francesco Piccinno, Syrine Krichene, Chenxi Pang, Kenton Lee, Mandar Joshi, Wenhu Chen, Nigel Collier, Yasemin Altun

Figure 1 for DePlot: One-shot visual language reasoning by plot-to-table translation
Figure 2 for DePlot: One-shot visual language reasoning by plot-to-table translation
Figure 3 for DePlot: One-shot visual language reasoning by plot-to-table translation
Figure 4 for DePlot: One-shot visual language reasoning by plot-to-table translation
Viaarxiv icon

MatCha: Enhancing Visual Language Pretraining with Math Reasoning and Chart Derendering

Dec 19, 2022
Fangyu Liu, Francesco Piccinno, Syrine Krichene, Chenxi Pang, Kenton Lee, Mandar Joshi, Yasemin Altun, Nigel Collier, Julian Martin Eisenschlos

Figure 1 for MatCha: Enhancing Visual Language Pretraining with Math Reasoning and Chart Derendering
Figure 2 for MatCha: Enhancing Visual Language Pretraining with Math Reasoning and Chart Derendering
Figure 3 for MatCha: Enhancing Visual Language Pretraining with Math Reasoning and Chart Derendering
Figure 4 for MatCha: Enhancing Visual Language Pretraining with Math Reasoning and Chart Derendering
Viaarxiv icon

Pix2Struct: Screenshot Parsing as Pretraining for Visual Language Understanding

Oct 07, 2022
Kenton Lee, Mandar Joshi, Iulia Turc, Hexiang Hu, Fangyu Liu, Julian Eisenschlos, Urvashi Khandelwal, Peter Shaw, Ming-Wei Chang, Kristina Toutanova

Figure 1 for Pix2Struct: Screenshot Parsing as Pretraining for Visual Language Understanding
Figure 2 for Pix2Struct: Screenshot Parsing as Pretraining for Visual Language Understanding
Figure 3 for Pix2Struct: Screenshot Parsing as Pretraining for Visual Language Understanding
Figure 4 for Pix2Struct: Screenshot Parsing as Pretraining for Visual Language Understanding
Viaarxiv icon

Few-shot Mining of Naturally Occurring Inputs and Outputs

May 09, 2022
Mandar Joshi, Terra Blevins, Mike Lewis, Daniel S. Weld, Luke Zettlemoyer

Figure 1 for Few-shot Mining of Naturally Occurring Inputs and Outputs
Figure 2 for Few-shot Mining of Naturally Occurring Inputs and Outputs
Figure 3 for Few-shot Mining of Naturally Occurring Inputs and Outputs
Figure 4 for Few-shot Mining of Naturally Occurring Inputs and Outputs
Viaarxiv icon

Improving Passage Retrieval with Zero-Shot Question Generation

Apr 15, 2022
Devendra Singh Sachan, Mike Lewis, Mandar Joshi, Armen Aghajanyan, Wen-tau Yih, Joelle Pineau, Luke Zettlemoyer

Figure 1 for Improving Passage Retrieval with Zero-Shot Question Generation
Figure 2 for Improving Passage Retrieval with Zero-Shot Question Generation
Figure 3 for Improving Passage Retrieval with Zero-Shot Question Generation
Figure 4 for Improving Passage Retrieval with Zero-Shot Question Generation
Viaarxiv icon