Alert button
Picture for Hongsheng Li

Hongsheng Li

Alert button

GLID: Pre-training a Generalist Encoder-Decoder Vision Model

Add code
Bookmark button
Alert button
Apr 11, 2024
Jihao Liu, Jinliang Zheng, Yu Liu, Hongsheng Li

Viaarxiv icon

Urban Architect: Steerable 3D Urban Scene Generation with Layout Prior

Add code
Bookmark button
Alert button
Apr 10, 2024
Fan Lu, Kwan-Yee Lin, Yan Xu, Hongsheng Li, Guang Chen, Changjun Jiang

Viaarxiv icon

CoMat: Aligning Text-to-Image Diffusion Model with Image-to-Text Concept Matching

Add code
Bookmark button
Alert button
Apr 04, 2024
Dongzhi Jiang, Guanglu Song, Xiaoshi Wu, Renrui Zhang, Dazhong Shen, Zhuofan Zong, Yu Liu, Hongsheng Li

Viaarxiv icon

CameraCtrl: Enabling Camera Control for Text-to-Video Generation

Add code
Bookmark button
Alert button
Apr 02, 2024
Hao He, Yinghao Xu, Yuwei Guo, Gordon Wetzstein, Bo Dai, Hongsheng Li, Ceyuan Yang

Viaarxiv icon

Draw-and-Understand: Leveraging Visual Prompts to Enable MLLMs to Comprehend What You Want

Add code
Bookmark button
Alert button
Apr 01, 2024
Weifeng Lin, Xinyu Wei, Ruichuan An, Peng Gao, Bocheng Zou, Yulin Luo, Siyuan Huang, Shanghang Zhang, Hongsheng Li

Figure 1 for Draw-and-Understand: Leveraging Visual Prompts to Enable MLLMs to Comprehend What You Want
Figure 2 for Draw-and-Understand: Leveraging Visual Prompts to Enable MLLMs to Comprehend What You Want
Figure 3 for Draw-and-Understand: Leveraging Visual Prompts to Enable MLLMs to Comprehend What You Want
Figure 4 for Draw-and-Understand: Leveraging Visual Prompts to Enable MLLMs to Comprehend What You Want
Viaarxiv icon

ECNet: Effective Controllable Text-to-Image Diffusion Models

Add code
Bookmark button
Alert button
Mar 27, 2024
Sicheng Li, Keqiang Sun, Zhixin Lai, Xiaoshi Wu, Feng Qiu, Haoran Xie, Kazunori Miyata, Hongsheng Li

Viaarxiv icon

Visual CoT: Unleashing Chain-of-Thought Reasoning in Multi-Modal Language Models

Add code
Bookmark button
Alert button
Mar 25, 2024
Hao Shao, Shengju Qian, Han Xiao, Guanglu Song, Zhuofan Zong, Letian Wang, Yu Liu, Hongsheng Li

Viaarxiv icon

MathVerse: Does Your Multi-modal LLM Truly See the Diagrams in Visual Math Problems?

Add code
Bookmark button
Alert button
Mar 21, 2024
Renrui Zhang, Dongzhi Jiang, Yichi Zhang, Haokun Lin, Ziyu Guo, Pengshuo Qiu, Aojun Zhou, Pan Lu, Kai-Wei Chang, Peng Gao, Hongsheng Li

Figure 1 for MathVerse: Does Your Multi-modal LLM Truly See the Diagrams in Visual Math Problems?
Figure 2 for MathVerse: Does Your Multi-modal LLM Truly See the Diagrams in Visual Math Problems?
Figure 3 for MathVerse: Does Your Multi-modal LLM Truly See the Diagrams in Visual Math Problems?
Figure 4 for MathVerse: Does Your Multi-modal LLM Truly See the Diagrams in Visual Math Problems?
Viaarxiv icon

Be-Your-Outpainter: Mastering Video Outpainting through Input-Specific Adaptation

Add code
Bookmark button
Alert button
Mar 20, 2024
Fu-Yun Wang, Xiaoshi Wu, Zhaoyang Huang, Xiaoyu Shi, Dazhong Shen, Guanglu Song, Yu Liu, Hongsheng Li

Figure 1 for Be-Your-Outpainter: Mastering Video Outpainting through Input-Specific Adaptation
Figure 2 for Be-Your-Outpainter: Mastering Video Outpainting through Input-Specific Adaptation
Figure 3 for Be-Your-Outpainter: Mastering Video Outpainting through Input-Specific Adaptation
Figure 4 for Be-Your-Outpainter: Mastering Video Outpainting through Input-Specific Adaptation
Viaarxiv icon

FouriScale: A Frequency Perspective on Training-Free High-Resolution Image Synthesis

Add code
Bookmark button
Alert button
Mar 19, 2024
Linjiang Huang, Rongyao Fang, Aiping Zhang, Guanglu Song, Si Liu, Yu Liu, Hongsheng Li

Figure 1 for FouriScale: A Frequency Perspective on Training-Free High-Resolution Image Synthesis
Figure 2 for FouriScale: A Frequency Perspective on Training-Free High-Resolution Image Synthesis
Figure 3 for FouriScale: A Frequency Perspective on Training-Free High-Resolution Image Synthesis
Figure 4 for FouriScale: A Frequency Perspective on Training-Free High-Resolution Image Synthesis
Viaarxiv icon