Alert button
Picture for Hexiang Hu

Hexiang Hu

Alert button

Pix2Struct: Screenshot Parsing as Pretraining for Visual Language Understanding

Oct 07, 2022
Kenton Lee, Mandar Joshi, Iulia Turc, Hexiang Hu, Fangyu Liu, Julian Eisenschlos, Urvashi Khandelwal, Peter Shaw, Ming-Wei Chang, Kristina Toutanova

Figure 1 for Pix2Struct: Screenshot Parsing as Pretraining for Visual Language Understanding
Figure 2 for Pix2Struct: Screenshot Parsing as Pretraining for Visual Language Understanding
Figure 3 for Pix2Struct: Screenshot Parsing as Pretraining for Visual Language Understanding
Figure 4 for Pix2Struct: Screenshot Parsing as Pretraining for Visual Language Understanding
Viaarxiv icon

MuRAG: Multimodal Retrieval-Augmented Generator for Open Question Answering over Images and Text

Oct 06, 2022
Wenhu Chen, Hexiang Hu, Xi Chen, Pat Verga, William W. Cohen

Figure 1 for MuRAG: Multimodal Retrieval-Augmented Generator for Open Question Answering over Images and Text
Figure 2 for MuRAG: Multimodal Retrieval-Augmented Generator for Open Question Answering over Images and Text
Figure 3 for MuRAG: Multimodal Retrieval-Augmented Generator for Open Question Answering over Images and Text
Figure 4 for MuRAG: Multimodal Retrieval-Augmented Generator for Open Question Answering over Images and Text
Viaarxiv icon

Re-Imagen: Retrieval-Augmented Text-to-Image Generator

Oct 01, 2022
Wenhu Chen, Hexiang Hu, Chitwan Saharia, William W. Cohen

Figure 1 for Re-Imagen: Retrieval-Augmented Text-to-Image Generator
Figure 2 for Re-Imagen: Retrieval-Augmented Text-to-Image Generator
Figure 3 for Re-Imagen: Retrieval-Augmented Text-to-Image Generator
Figure 4 for Re-Imagen: Retrieval-Augmented Text-to-Image Generator
Viaarxiv icon

PreSTU: Pre-Training for Scene-Text Understanding

Sep 12, 2022
Jihyung Kil, Soravit Changpinyo, Xi Chen, Hexiang Hu, Sebastian Goodman, Wei-Lun Chao, Radu Soricut

Figure 1 for PreSTU: Pre-Training for Scene-Text Understanding
Figure 2 for PreSTU: Pre-Training for Scene-Text Understanding
Figure 3 for PreSTU: Pre-Training for Scene-Text Understanding
Figure 4 for PreSTU: Pre-Training for Scene-Text Understanding
Viaarxiv icon

Visually Grounded Concept Composition

Sep 29, 2021
Bowen Zhang, Hexiang Hu, Linlu Qiu, Peter Shaw, Fei Sha

Figure 1 for Visually Grounded Concept Composition
Figure 2 for Visually Grounded Concept Composition
Figure 3 for Visually Grounded Concept Composition
Figure 4 for Visually Grounded Concept Composition
Viaarxiv icon

Systematic Generalization on gSCAN: What is Nearly Solved and What is Next?

Sep 25, 2021
Linlu Qiu, Hexiang Hu, Bowen Zhang, Peter Shaw, Fei Sha

Figure 1 for Systematic Generalization on gSCAN: What is Nearly Solved and What is Next?
Figure 2 for Systematic Generalization on gSCAN: What is Nearly Solved and What is Next?
Figure 3 for Systematic Generalization on gSCAN: What is Nearly Solved and What is Next?
Figure 4 for Systematic Generalization on gSCAN: What is Nearly Solved and What is Next?
Viaarxiv icon

On Model Calibration for Long-Tailed Object Detection and Instance Segmentation

Jul 05, 2021
Tai-Yu Pan, Cheng Zhang, Yandong Li, Hexiang Hu, Dong Xuan, Soravit Changpinyo, Boqing Gong, Wei-Lun Chao

Figure 1 for On Model Calibration for Long-Tailed Object Detection and Instance Segmentation
Figure 2 for On Model Calibration for Long-Tailed Object Detection and Instance Segmentation
Figure 3 for On Model Calibration for Long-Tailed Object Detection and Instance Segmentation
Figure 4 for On Model Calibration for Long-Tailed Object Detection and Instance Segmentation
Viaarxiv icon

A Simple and Effective Use of Object-Centric Images for Long-Tailed Object Detection

Feb 17, 2021
Cheng Zhang, Tai-Yu Pan, Yandong Li, Hexiang Hu, Dong Xuan, Soravit Changpinyo, Boqing Gong, Wei-Lun Chao

Figure 1 for A Simple and Effective Use of Object-Centric Images for Long-Tailed Object Detection
Figure 2 for A Simple and Effective Use of Object-Centric Images for Long-Tailed Object Detection
Figure 3 for A Simple and Effective Use of Object-Centric Images for Long-Tailed Object Detection
Figure 4 for A Simple and Effective Use of Object-Centric Images for Long-Tailed Object Detection
Viaarxiv icon

A Hierarchical Multi-Modal Encoder for Moment Localization in Video Corpus

Nov 24, 2020
Bowen Zhang, Hexiang Hu, Joonseok Lee, Ming Zhao, Sheide Chammas, Vihan Jain, Eugene Ie, Fei Sha

Figure 1 for A Hierarchical Multi-Modal Encoder for Moment Localization in Video Corpus
Figure 2 for A Hierarchical Multi-Modal Encoder for Moment Localization in Video Corpus
Figure 3 for A Hierarchical Multi-Modal Encoder for Moment Localization in Video Corpus
Figure 4 for A Hierarchical Multi-Modal Encoder for Moment Localization in Video Corpus
Viaarxiv icon