Picture for Hongwei Xue

Hongwei Xue

Visual Perception by Large Language Model's Weights

Add code
May 30, 2024
Viaarxiv icon

Multi-Modal Generative Embedding Model

Add code
May 29, 2024
Viaarxiv icon

Stare at What You See: Masked Image Modeling without Reconstruction

Add code
Nov 16, 2022
Figure 1 for Stare at What You See: Masked Image Modeling without Reconstruction
Figure 2 for Stare at What You See: Masked Image Modeling without Reconstruction
Figure 3 for Stare at What You See: Masked Image Modeling without Reconstruction
Figure 4 for Stare at What You See: Masked Image Modeling without Reconstruction
Viaarxiv icon

Long-Form Video-Language Pre-Training with Multimodal Temporal Contrastive Learning

Add code
Oct 12, 2022
Figure 1 for Long-Form Video-Language Pre-Training with Multimodal Temporal Contrastive Learning
Figure 2 for Long-Form Video-Language Pre-Training with Multimodal Temporal Contrastive Learning
Figure 3 for Long-Form Video-Language Pre-Training with Multimodal Temporal Contrastive Learning
Figure 4 for Long-Form Video-Language Pre-Training with Multimodal Temporal Contrastive Learning
Viaarxiv icon

CLIP-ViP: Adapting Pre-trained Image-Text Model to Video-Language Representation Alignment

Add code
Sep 23, 2022
Figure 1 for CLIP-ViP: Adapting Pre-trained Image-Text Model to Video-Language Representation Alignment
Figure 2 for CLIP-ViP: Adapting Pre-trained Image-Text Model to Video-Language Representation Alignment
Figure 3 for CLIP-ViP: Adapting Pre-trained Image-Text Model to Video-Language Representation Alignment
Figure 4 for CLIP-ViP: Adapting Pre-trained Image-Text Model to Video-Language Representation Alignment
Viaarxiv icon

Advancing High-Resolution Video-Language Representation with Large-Scale Video Transcriptions

Add code
Nov 19, 2021
Figure 1 for Advancing High-Resolution Video-Language Representation with Large-Scale Video Transcriptions
Figure 2 for Advancing High-Resolution Video-Language Representation with Large-Scale Video Transcriptions
Figure 3 for Advancing High-Resolution Video-Language Representation with Large-Scale Video Transcriptions
Figure 4 for Advancing High-Resolution Video-Language Representation with Large-Scale Video Transcriptions
Viaarxiv icon

Unifying Multimodal Transformer for Bi-directional Image and Text Generation

Add code
Oct 19, 2021
Figure 1 for Unifying Multimodal Transformer for Bi-directional Image and Text Generation
Figure 2 for Unifying Multimodal Transformer for Bi-directional Image and Text Generation
Figure 3 for Unifying Multimodal Transformer for Bi-directional Image and Text Generation
Figure 4 for Unifying Multimodal Transformer for Bi-directional Image and Text Generation
Viaarxiv icon

Learning Fine-Grained Motion Embedding for Landscape Animation

Add code
Sep 13, 2021
Figure 1 for Learning Fine-Grained Motion Embedding for Landscape Animation
Figure 2 for Learning Fine-Grained Motion Embedding for Landscape Animation
Figure 3 for Learning Fine-Grained Motion Embedding for Landscape Animation
Figure 4 for Learning Fine-Grained Motion Embedding for Landscape Animation
Viaarxiv icon

Probing Inter-modality: Visual Parsing with Self-Attention for Vision-Language Pre-training

Add code
Jun 28, 2021
Figure 1 for Probing Inter-modality: Visual Parsing with Self-Attention for Vision-Language Pre-training
Figure 2 for Probing Inter-modality: Visual Parsing with Self-Attention for Vision-Language Pre-training
Figure 3 for Probing Inter-modality: Visual Parsing with Self-Attention for Vision-Language Pre-training
Figure 4 for Probing Inter-modality: Visual Parsing with Self-Attention for Vision-Language Pre-training
Viaarxiv icon