Alert button
Picture for Hongwei Xue

Hongwei Xue

Alert button

Stare at What You See: Masked Image Modeling without Reconstruction

Nov 16, 2022
Hongwei Xue, Peng Gao, Hongyang Li, Yu Qiao, Hao Sun, Houqiang Li, Jiebo Luo

Figure 1 for Stare at What You See: Masked Image Modeling without Reconstruction
Figure 2 for Stare at What You See: Masked Image Modeling without Reconstruction
Figure 3 for Stare at What You See: Masked Image Modeling without Reconstruction
Figure 4 for Stare at What You See: Masked Image Modeling without Reconstruction
Viaarxiv icon

Long-Form Video-Language Pre-Training with Multimodal Temporal Contrastive Learning

Oct 12, 2022
Yuchong Sun, Hongwei Xue, Ruihua Song, Bei Liu, Huan Yang, Jianlong Fu

Figure 1 for Long-Form Video-Language Pre-Training with Multimodal Temporal Contrastive Learning
Figure 2 for Long-Form Video-Language Pre-Training with Multimodal Temporal Contrastive Learning
Figure 3 for Long-Form Video-Language Pre-Training with Multimodal Temporal Contrastive Learning
Figure 4 for Long-Form Video-Language Pre-Training with Multimodal Temporal Contrastive Learning
Viaarxiv icon

CLIP-ViP: Adapting Pre-trained Image-Text Model to Video-Language Representation Alignment

Sep 23, 2022
Hongwei Xue, Yuchong Sun, Bei Liu, Jianlong Fu, Ruihua Song, Houqiang Li, Jiebo Luo

Figure 1 for CLIP-ViP: Adapting Pre-trained Image-Text Model to Video-Language Representation Alignment
Figure 2 for CLIP-ViP: Adapting Pre-trained Image-Text Model to Video-Language Representation Alignment
Figure 3 for CLIP-ViP: Adapting Pre-trained Image-Text Model to Video-Language Representation Alignment
Figure 4 for CLIP-ViP: Adapting Pre-trained Image-Text Model to Video-Language Representation Alignment
Viaarxiv icon

Advancing High-Resolution Video-Language Representation with Large-Scale Video Transcriptions

Nov 19, 2021
Hongwei Xue, Tiankai Hang, Yanhong Zeng, Yuchong Sun, Bei Liu, Huan Yang, Jianlong Fu, Baining Guo

Figure 1 for Advancing High-Resolution Video-Language Representation with Large-Scale Video Transcriptions
Figure 2 for Advancing High-Resolution Video-Language Representation with Large-Scale Video Transcriptions
Figure 3 for Advancing High-Resolution Video-Language Representation with Large-Scale Video Transcriptions
Figure 4 for Advancing High-Resolution Video-Language Representation with Large-Scale Video Transcriptions
Viaarxiv icon

Unifying Multimodal Transformer for Bi-directional Image and Text Generation

Oct 19, 2021
Yupan Huang, Hongwei Xue, Bei Liu, Yutong Lu

Figure 1 for Unifying Multimodal Transformer for Bi-directional Image and Text Generation
Figure 2 for Unifying Multimodal Transformer for Bi-directional Image and Text Generation
Figure 3 for Unifying Multimodal Transformer for Bi-directional Image and Text Generation
Figure 4 for Unifying Multimodal Transformer for Bi-directional Image and Text Generation
Viaarxiv icon

Learning Fine-Grained Motion Embedding for Landscape Animation

Sep 13, 2021
Hongwei Xue, Bei Liu, Huan Yang, Jianlong Fu, Houqiang Li, Jiebo Luo

Figure 1 for Learning Fine-Grained Motion Embedding for Landscape Animation
Figure 2 for Learning Fine-Grained Motion Embedding for Landscape Animation
Figure 3 for Learning Fine-Grained Motion Embedding for Landscape Animation
Figure 4 for Learning Fine-Grained Motion Embedding for Landscape Animation
Viaarxiv icon

Probing Inter-modality: Visual Parsing with Self-Attention for Vision-Language Pre-training

Jun 28, 2021
Hongwei Xue, Yupan Huang, Bei Liu, Houwen Peng, Jianlong Fu, Houqiang Li, Jiebo Luo

Figure 1 for Probing Inter-modality: Visual Parsing with Self-Attention for Vision-Language Pre-training
Figure 2 for Probing Inter-modality: Visual Parsing with Self-Attention for Vision-Language Pre-training
Figure 3 for Probing Inter-modality: Visual Parsing with Self-Attention for Vision-Language Pre-training
Figure 4 for Probing Inter-modality: Visual Parsing with Self-Attention for Vision-Language Pre-training
Viaarxiv icon