Alert button
Picture for Xizhou Zhu

Xizhou Zhu

Alert button

ControlLLM: Augment Language Models with Tools by Searching on Graphs

Oct 30, 2023
Zhaoyang Liu, Zeqiang Lai, Zhangwei Gao, Erfei Cui, Zhiheng Li, Xizhou Zhu, Lewei Lu, Qifeng Chen, Yu Qiao, Jifeng Dai, Wenhai Wang

Figure 1 for ControlLLM: Augment Language Models with Tools by Searching on Graphs
Figure 2 for ControlLLM: Augment Language Models with Tools by Searching on Graphs
Figure 3 for ControlLLM: Augment Language Models with Tools by Searching on Graphs
Figure 4 for ControlLLM: Augment Language Models with Tools by Searching on Graphs
Viaarxiv icon

Mini-DALLE3: Interactive Text to Image by Prompting Large Language Models

Oct 12, 2023
Zeqiang Lai, Xizhou Zhu, Jifeng Dai, Yu Qiao, Wenhai Wang

Figure 1 for Mini-DALLE3: Interactive Text to Image by Prompting Large Language Models
Figure 2 for Mini-DALLE3: Interactive Text to Image by Prompting Large Language Models
Figure 3 for Mini-DALLE3: Interactive Text to Image by Prompting Large Language Models
Figure 4 for Mini-DALLE3: Interactive Text to Image by Prompting Large Language Models
Viaarxiv icon

The All-Seeing Project: Towards Panoptic Visual Recognition and Understanding of the Open World

Aug 03, 2023
Weiyun Wang, Min Shi, Qingyun Li, Wenhai Wang, Zhenhang Huang, Linjie Xing, Zhe Chen, Hao Li, Xizhou Zhu, Zhiguo Cao, Yushi Chen, Tong Lu, Jifeng Dai, Yu Qiao

Figure 1 for The All-Seeing Project: Towards Panoptic Visual Recognition and Understanding of the Open World
Figure 2 for The All-Seeing Project: Towards Panoptic Visual Recognition and Understanding of the Open World
Figure 3 for The All-Seeing Project: Towards Panoptic Visual Recognition and Understanding of the Open World
Figure 4 for The All-Seeing Project: Towards Panoptic Visual Recognition and Understanding of the Open World
Viaarxiv icon

ADDP: Learning General Representations for Image Recognition and Generation with Alternating Denoising Diffusion Process

Jun 08, 2023
Changyao Tian, Chenxin Tao, Jifeng Dai, Hao Li, Ziheng Li, Lewei Lu, Xiaogang Wang, Hongsheng Li, Gao Huang, Xizhou Zhu

Figure 1 for ADDP: Learning General Representations for Image Recognition and Generation with Alternating Denoising Diffusion Process
Figure 2 for ADDP: Learning General Representations for Image Recognition and Generation with Alternating Denoising Diffusion Process
Figure 3 for ADDP: Learning General Representations for Image Recognition and Generation with Alternating Denoising Diffusion Process
Figure 4 for ADDP: Learning General Representations for Image Recognition and Generation with Alternating Denoising Diffusion Process
Viaarxiv icon

Ghost in the Minecraft: Generally Capable Agents for Open-World Environments via Large Language Models with Text-based Knowledge and Memory

Jun 01, 2023
Xizhou Zhu, Yuntao Chen, Hao Tian, Chenxin Tao, Weijie Su, Chenyu Yang, Gao Huang, Bin Li, Lewei Lu, Xiaogang Wang, Yu Qiao, Zhaoxiang Zhang, Jifeng Dai

Figure 1 for Ghost in the Minecraft: Generally Capable Agents for Open-World Environments via Large Language Models with Text-based Knowledge and Memory
Figure 2 for Ghost in the Minecraft: Generally Capable Agents for Open-World Environments via Large Language Models with Text-based Knowledge and Memory
Figure 3 for Ghost in the Minecraft: Generally Capable Agents for Open-World Environments via Large Language Models with Text-based Knowledge and Memory
Figure 4 for Ghost in the Minecraft: Generally Capable Agents for Open-World Environments via Large Language Models with Text-based Knowledge and Memory
Viaarxiv icon

VisionLLM: Large Language Model is also an Open-Ended Decoder for Vision-Centric Tasks

May 25, 2023
Wenhai Wang, Zhe Chen, Xiaokang Chen, Jiannan Wu, Xizhou Zhu, Gang Zeng, Ping Luo, Tong Lu, Jie Zhou, Yu Qiao, Jifeng Dai

Figure 1 for VisionLLM: Large Language Model is also an Open-Ended Decoder for Vision-Centric Tasks
Figure 2 for VisionLLM: Large Language Model is also an Open-Ended Decoder for Vision-Centric Tasks
Figure 3 for VisionLLM: Large Language Model is also an Open-Ended Decoder for Vision-Centric Tasks
Figure 4 for VisionLLM: Large Language Model is also an Open-Ended Decoder for Vision-Centric Tasks
Viaarxiv icon

InternGPT: Solving Vision-Centric Tasks by Interacting with ChatGPT Beyond Language

May 11, 2023
Zhaoyang Liu, Yinan He, Wenhai Wang, Weiyun Wang, Yi Wang, Shoufa Chen, Qinglong Zhang, Yang Yang, Qingyun Li, Jiashuo Yu, Kunchang Li, Zhe Chen, Xue Yang, Xizhou Zhu, Yali Wang, Limin Wang, Ping Luo, Jifeng Dai, Yu Qiao

Figure 1 for InternGPT: Solving Vision-Centric Tasks by Interacting with ChatGPT Beyond Language
Figure 2 for InternGPT: Solving Vision-Centric Tasks by Interacting with ChatGPT Beyond Language
Figure 3 for InternGPT: Solving Vision-Centric Tasks by Interacting with ChatGPT Beyond Language
Figure 4 for InternGPT: Solving Vision-Centric Tasks by Interacting with ChatGPT Beyond Language
Viaarxiv icon

Goal-oriented Autonomous Driving

Dec 20, 2022
Yihan Hu, Jiazhi Yang, Li Chen, Keyu Li, Chonghao Sima, Xizhou Zhu, Siqi Chai, Senyao Du, Tianwei Lin, Wenhai Wang, Lewei Lu, Xiaosong Jia, Qiang Liu, Jifeng Dai, Yu Qiao, Hongyang Li

Figure 1 for Goal-oriented Autonomous Driving
Figure 2 for Goal-oriented Autonomous Driving
Figure 3 for Goal-oriented Autonomous Driving
Figure 4 for Goal-oriented Autonomous Driving
Viaarxiv icon

Towards All-in-one Pre-training via Maximizing Multi-modal Mutual Information

Nov 21, 2022
Weijie Su, Xizhou Zhu, Chenxin Tao, Lewei Lu, Bin Li, Gao Huang, Yu Qiao, Xiaogang Wang, Jie Zhou, Jifeng Dai

Figure 1 for Towards All-in-one Pre-training via Maximizing Multi-modal Mutual Information
Figure 2 for Towards All-in-one Pre-training via Maximizing Multi-modal Mutual Information
Figure 3 for Towards All-in-one Pre-training via Maximizing Multi-modal Mutual Information
Figure 4 for Towards All-in-one Pre-training via Maximizing Multi-modal Mutual Information
Viaarxiv icon