Picture for Jianfeng Wang

Jianfeng Wang

Tony

Spatial-Frequency U-Net for Denoising Diffusion Probabilistic Models

Add code
Jul 27, 2023
Viaarxiv icon

Aligning Large Multi-Modal Model with Robust Instruction Tuning

Add code
Jun 26, 2023
Viaarxiv icon

MultiSum: A Dataset for Multimodal Summarization and Thumbnail Generation of Videos

Add code
Jun 07, 2023
Figure 1 for MultiSum: A Dataset for Multimodal Summarization and Thumbnail Generation of Videos
Figure 2 for MultiSum: A Dataset for Multimodal Summarization and Thumbnail Generation of Videos
Figure 3 for MultiSum: A Dataset for Multimodal Summarization and Thumbnail Generation of Videos
Figure 4 for MultiSum: A Dataset for Multimodal Summarization and Thumbnail Generation of Videos
Viaarxiv icon

NUWA-XL: Diffusion over Diffusion for eXtremely Long Video Generation

Add code
Mar 22, 2023
Figure 1 for NUWA-XL: Diffusion over Diffusion for eXtremely Long Video Generation
Figure 2 for NUWA-XL: Diffusion over Diffusion for eXtremely Long Video Generation
Figure 3 for NUWA-XL: Diffusion over Diffusion for eXtremely Long Video Generation
Figure 4 for NUWA-XL: Diffusion over Diffusion for eXtremely Long Video Generation
Viaarxiv icon

MM-REACT: Prompting ChatGPT for Multimodal Reasoning and Action

Add code
Mar 20, 2023
Figure 1 for MM-REACT: Prompting ChatGPT for Multimodal Reasoning and Action
Figure 2 for MM-REACT: Prompting ChatGPT for Multimodal Reasoning and Action
Figure 3 for MM-REACT: Prompting ChatGPT for Multimodal Reasoning and Action
Figure 4 for MM-REACT: Prompting ChatGPT for Multimodal Reasoning and Action
Viaarxiv icon

Learning 3D Photography Videos via Self-supervised Diffusion on Single Images

Add code
Feb 21, 2023
Figure 1 for Learning 3D Photography Videos via Self-supervised Diffusion on Single Images
Figure 2 for Learning 3D Photography Videos via Self-supervised Diffusion on Single Images
Figure 3 for Learning 3D Photography Videos via Self-supervised Diffusion on Single Images
Figure 4 for Learning 3D Photography Videos via Self-supervised Diffusion on Single Images
Viaarxiv icon

NP-Match: Towards a New Probabilistic Model for Semi-Supervised Learning

Add code
Jan 31, 2023
Viaarxiv icon

Generalized Decoding for Pixel, Image, and Language

Add code
Dec 21, 2022
Figure 1 for Generalized Decoding for Pixel, Image, and Language
Figure 2 for Generalized Decoding for Pixel, Image, and Language
Figure 3 for Generalized Decoding for Pixel, Image, and Language
Figure 4 for Generalized Decoding for Pixel, Image, and Language
Viaarxiv icon

Exploring Discrete Diffusion Models for Image Captioning

Add code
Dec 09, 2022
Figure 1 for Exploring Discrete Diffusion Models for Image Captioning
Figure 2 for Exploring Discrete Diffusion Models for Image Captioning
Figure 3 for Exploring Discrete Diffusion Models for Image Captioning
Figure 4 for Exploring Discrete Diffusion Models for Image Captioning
Viaarxiv icon

GRiT: A Generative Region-to-text Transformer for Object Understanding

Add code
Dec 01, 2022
Viaarxiv icon