Alert button
Picture for Yupan Huang

Yupan Huang

Alert button

TextDiffuser-2: Unleashing the Power of Language Models for Text Rendering

Nov 28, 2023
Jingye Chen, Yupan Huang, Tengchao Lv, Lei Cui, Qifeng Chen, Furu Wei

Figure 1 for TextDiffuser-2: Unleashing the Power of Language Models for Text Rendering
Figure 2 for TextDiffuser-2: Unleashing the Power of Language Models for Text Rendering
Figure 3 for TextDiffuser-2: Unleashing the Power of Language Models for Text Rendering
Figure 4 for TextDiffuser-2: Unleashing the Power of Language Models for Text Rendering
Viaarxiv icon

Kosmos-2.5: A Multimodal Literate Model

Sep 20, 2023
Tengchao Lv, Yupan Huang, Jingye Chen, Lei Cui, Shuming Ma, Yaoyao Chang, Shaohan Huang, Wenhui Wang, Li Dong, Weiyao Luo, Shaoxiang Wu, Guoxin Wang, Cha Zhang, Furu Wei

Figure 1 for Kosmos-2.5: A Multimodal Literate Model
Figure 2 for Kosmos-2.5: A Multimodal Literate Model
Figure 3 for Kosmos-2.5: A Multimodal Literate Model
Figure 4 for Kosmos-2.5: A Multimodal Literate Model
Viaarxiv icon

Sparkles: Unlocking Chats Across Multiple Images for Multimodal Instruction-Following Models

Aug 31, 2023
Yupan Huang, Zaiqiao Meng, Fangyu Liu, Yixuan Su, Nigel Collier, Yutong Lu

Figure 1 for Sparkles: Unlocking Chats Across Multiple Images for Multimodal Instruction-Following Models
Figure 2 for Sparkles: Unlocking Chats Across Multiple Images for Multimodal Instruction-Following Models
Figure 3 for Sparkles: Unlocking Chats Across Multiple Images for Multimodal Instruction-Following Models
Figure 4 for Sparkles: Unlocking Chats Across Multiple Images for Multimodal Instruction-Following Models
Viaarxiv icon

TextDiffuser: Diffusion Models as Text Painters

May 24, 2023
Jingye Chen, Yupan Huang, Tengchao Lv, Lei Cui, Qifeng Chen, Furu Wei

Figure 1 for TextDiffuser: Diffusion Models as Text Painters
Figure 2 for TextDiffuser: Diffusion Models as Text Painters
Figure 3 for TextDiffuser: Diffusion Models as Text Painters
Figure 4 for TextDiffuser: Diffusion Models as Text Painters
Viaarxiv icon

LayoutLMv3: Pre-training for Document AI with Unified Text and Image Masking

Apr 19, 2022
Yupan Huang, Tengchao Lv, Lei Cui, Yutong Lu, Furu Wei

Figure 1 for LayoutLMv3: Pre-training for Document AI with Unified Text and Image Masking
Figure 2 for LayoutLMv3: Pre-training for Document AI with Unified Text and Image Masking
Figure 3 for LayoutLMv3: Pre-training for Document AI with Unified Text and Image Masking
Figure 4 for LayoutLMv3: Pre-training for Document AI with Unified Text and Image Masking
Viaarxiv icon

A Picture is Worth a Thousand Words: A Unified System for Diverse Captions and Rich Images Generation

Oct 19, 2021
Yupan Huang, Bei Liu, Jianlong Fu, Yutong Lu

Figure 1 for A Picture is Worth a Thousand Words: A Unified System for Diverse Captions and Rich Images Generation
Figure 2 for A Picture is Worth a Thousand Words: A Unified System for Diverse Captions and Rich Images Generation
Figure 3 for A Picture is Worth a Thousand Words: A Unified System for Diverse Captions and Rich Images Generation
Viaarxiv icon

Unifying Multimodal Transformer for Bi-directional Image and Text Generation

Oct 19, 2021
Yupan Huang, Hongwei Xue, Bei Liu, Yutong Lu

Figure 1 for Unifying Multimodal Transformer for Bi-directional Image and Text Generation
Figure 2 for Unifying Multimodal Transformer for Bi-directional Image and Text Generation
Figure 3 for Unifying Multimodal Transformer for Bi-directional Image and Text Generation
Figure 4 for Unifying Multimodal Transformer for Bi-directional Image and Text Generation
Viaarxiv icon

Probing Inter-modality: Visual Parsing with Self-Attention for Vision-Language Pre-training

Jun 28, 2021
Hongwei Xue, Yupan Huang, Bei Liu, Houwen Peng, Jianlong Fu, Houqiang Li, Jiebo Luo

Figure 1 for Probing Inter-modality: Visual Parsing with Self-Attention for Vision-Language Pre-training
Figure 2 for Probing Inter-modality: Visual Parsing with Self-Attention for Vision-Language Pre-training
Figure 3 for Probing Inter-modality: Visual Parsing with Self-Attention for Vision-Language Pre-training
Figure 4 for Probing Inter-modality: Visual Parsing with Self-Attention for Vision-Language Pre-training
Viaarxiv icon

Seeing Out of tHe bOx: End-to-End Pre-training for Vision-Language Representation Learning

Apr 08, 2021
Zhicheng Huang, Zhaoyang Zeng, Yupan Huang, Bei Liu, Dongmei Fu, Jianlong Fu

Figure 1 for Seeing Out of tHe bOx: End-to-End Pre-training for Vision-Language Representation Learning
Figure 2 for Seeing Out of tHe bOx: End-to-End Pre-training for Vision-Language Representation Learning
Figure 3 for Seeing Out of tHe bOx: End-to-End Pre-training for Vision-Language Representation Learning
Figure 4 for Seeing Out of tHe bOx: End-to-End Pre-training for Vision-Language Representation Learning
Viaarxiv icon