Alert button
Picture for Xuehai He

Xuehai He

Alert button

Mastering Robot Manipulation with Multimodal Prompts through Pretraining and Multi-task Fine-tuning

Add code
Bookmark button
Alert button
Oct 14, 2023
Jiachen Li, Qiaozi Gao, Michael Johnston, Xiaofeng Gao, Xuehai He, Suhaila Shakiah, Hangjie Shi, Reza Ghanadan, William Yang Wang

Figure 1 for Mastering Robot Manipulation with Multimodal Prompts through Pretraining and Multi-task Fine-tuning
Figure 2 for Mastering Robot Manipulation with Multimodal Prompts through Pretraining and Multi-task Fine-tuning
Figure 3 for Mastering Robot Manipulation with Multimodal Prompts through Pretraining and Multi-task Fine-tuning
Figure 4 for Mastering Robot Manipulation with Multimodal Prompts through Pretraining and Multi-task Fine-tuning
Viaarxiv icon

MiniGPT-5: Interleaved Vision-and-Language Generation via Generative Vokens

Add code
Bookmark button
Alert button
Oct 05, 2023
Kaizhi Zheng, Xuehai He, Xin Eric Wang

Viaarxiv icon

LayoutGPT: Compositional Visual Planning and Generation with Large Language Models

Add code
Bookmark button
Alert button
May 24, 2023
Weixi Feng, Wanrong Zhu, Tsu-jui Fu, Varun Jampani, Arjun Akula, Xuehai He, Sugato Basu, Xin Eric Wang, William Yang Wang

Figure 1 for LayoutGPT: Compositional Visual Planning and Generation with Large Language Models
Figure 2 for LayoutGPT: Compositional Visual Planning and Generation with Large Language Models
Figure 3 for LayoutGPT: Compositional Visual Planning and Generation with Large Language Models
Figure 4 for LayoutGPT: Compositional Visual Planning and Generation with Large Language Models
Viaarxiv icon

Discriminative Diffusion Models as Few-shot Vision and Language Learners

Add code
Bookmark button
Alert button
May 18, 2023
Xuehai He, Weixi Feng, Tsu-Jui Fu, Varun Jampani, Arjun Akula, Pradyumna Narayana, Sugato Basu, William Yang Wang, Xin Eric Wang

Figure 1 for Discriminative Diffusion Models as Few-shot Vision and Language Learners
Figure 2 for Discriminative Diffusion Models as Few-shot Vision and Language Learners
Figure 3 for Discriminative Diffusion Models as Few-shot Vision and Language Learners
Figure 4 for Discriminative Diffusion Models as Few-shot Vision and Language Learners
Viaarxiv icon

Multimodal Graph Transformer for Multimodal Question Answering

Add code
Bookmark button
Alert button
Apr 30, 2023
Xuehai He, Xin Eric Wang

Figure 1 for Multimodal Graph Transformer for Multimodal Question Answering
Figure 2 for Multimodal Graph Transformer for Multimodal Question Answering
Figure 3 for Multimodal Graph Transformer for Multimodal Question Answering
Figure 4 for Multimodal Graph Transformer for Multimodal Question Answering
Viaarxiv icon

Training-Free Structured Diffusion Guidance for Compositional Text-to-Image Synthesis

Add code
Bookmark button
Alert button
Dec 09, 2022
Weixi Feng, Xuehai He, Tsu-Jui Fu, Varun Jampani, Arjun Akula, Pradyumna Narayana, Sugato Basu, Xin Eric Wang, William Yang Wang

Figure 1 for Training-Free Structured Diffusion Guidance for Compositional Text-to-Image Synthesis
Figure 2 for Training-Free Structured Diffusion Guidance for Compositional Text-to-Image Synthesis
Figure 3 for Training-Free Structured Diffusion Guidance for Compositional Text-to-Image Synthesis
Figure 4 for Training-Free Structured Diffusion Guidance for Compositional Text-to-Image Synthesis
Viaarxiv icon

ComCLIP: Training-Free Compositional Image and Text Matching

Add code
Bookmark button
Alert button
Nov 25, 2022
Kenan Jiang, Xuehai He, Ruize Xu, Xin Eric Wang

Figure 1 for ComCLIP: Training-Free Compositional Image and Text Matching
Figure 2 for ComCLIP: Training-Free Compositional Image and Text Matching
Figure 3 for ComCLIP: Training-Free Compositional Image and Text Matching
Figure 4 for ComCLIP: Training-Free Compositional Image and Text Matching
Viaarxiv icon

CPL: Counterfactual Prompt Learning for Vision and Language Models

Add code
Bookmark button
Alert button
Oct 19, 2022
Xuehai He, Diji Yang, Weixi Feng, Tsu-Jui Fu, Arjun Akula, Varun Jampani, Pradyumna Narayana, Sugato Basu, William Yang Wang, Xin Eric Wang

Figure 1 for CPL: Counterfactual Prompt Learning for Vision and Language Models
Figure 2 for CPL: Counterfactual Prompt Learning for Vision and Language Models
Figure 3 for CPL: Counterfactual Prompt Learning for Vision and Language Models
Figure 4 for CPL: Counterfactual Prompt Learning for Vision and Language Models
Viaarxiv icon

JARVIS: A Neuro-Symbolic Commonsense Reasoning Framework for Conversational Embodied Agents

Add code
Bookmark button
Alert button
Aug 30, 2022
Kaizhi Zheng, Kaiwen Zhou, Jing Gu, Yue Fan, Jialu Wang, Zonglin Di, Xuehai He, Xin Eric Wang

Figure 1 for JARVIS: A Neuro-Symbolic Commonsense Reasoning Framework for Conversational Embodied Agents
Figure 2 for JARVIS: A Neuro-Symbolic Commonsense Reasoning Framework for Conversational Embodied Agents
Figure 3 for JARVIS: A Neuro-Symbolic Commonsense Reasoning Framework for Conversational Embodied Agents
Figure 4 for JARVIS: A Neuro-Symbolic Commonsense Reasoning Framework for Conversational Embodied Agents
Viaarxiv icon

Parameter-efficient Fine-tuning for Vision Transformers

Add code
Bookmark button
Alert button
Mar 29, 2022
Xuehai He, Chunyuan Li, Pengchuan Zhang, Jianwei Yang, Xin Eric Wang

Figure 1 for Parameter-efficient Fine-tuning for Vision Transformers
Figure 2 for Parameter-efficient Fine-tuning for Vision Transformers
Figure 3 for Parameter-efficient Fine-tuning for Vision Transformers
Figure 4 for Parameter-efficient Fine-tuning for Vision Transformers
Viaarxiv icon