Picture for Xi Yin

Xi Yin

AcademicGPT: Empowering Academic Research

Add code
Nov 21, 2023
Viaarxiv icon

Emu Video: Factorizing Text-to-Video Generation by Explicit Image Conditioning

Add code
Nov 17, 2023
Viaarxiv icon

Towards Effective and Compact Contextual Representation for Conformer Transducer Speech Recognition Systems

Add code
Jun 26, 2023
Figure 1 for Towards Effective and Compact Contextual Representation for Conformer Transducer Speech Recognition Systems
Figure 2 for Towards Effective and Compact Contextual Representation for Conformer Transducer Speech Recognition Systems
Figure 3 for Towards Effective and Compact Contextual Representation for Conformer Transducer Speech Recognition Systems
Figure 4 for Towards Effective and Compact Contextual Representation for Conformer Transducer Speech Recognition Systems
Viaarxiv icon

Latent-Shift: Latent Diffusion with Temporal Shift for Efficient Text-to-Video Generation

Add code
Apr 18, 2023
Figure 1 for Latent-Shift: Latent Diffusion with Temporal Shift for Efficient Text-to-Video Generation
Figure 2 for Latent-Shift: Latent Diffusion with Temporal Shift for Efficient Text-to-Video Generation
Figure 3 for Latent-Shift: Latent Diffusion with Temporal Shift for Efficient Text-to-Video Generation
Figure 4 for Latent-Shift: Latent Diffusion with Temporal Shift for Efficient Text-to-Video Generation
Viaarxiv icon

MaLP: Manipulation Localization Using a Proactive Scheme

Add code
Apr 04, 2023
Figure 1 for MaLP: Manipulation Localization Using a Proactive Scheme
Figure 2 for MaLP: Manipulation Localization Using a Proactive Scheme
Figure 3 for MaLP: Manipulation Localization Using a Proactive Scheme
Figure 4 for MaLP: Manipulation Localization Using a Proactive Scheme
Viaarxiv icon

SpaText: Spatio-Textual Representation for Controllable Image Generation

Add code
Nov 25, 2022
Figure 1 for SpaText: Spatio-Textual Representation for Controllable Image Generation
Figure 2 for SpaText: Spatio-Textual Representation for Controllable Image Generation
Figure 3 for SpaText: Spatio-Textual Representation for Controllable Image Generation
Figure 4 for SpaText: Spatio-Textual Representation for Controllable Image Generation
Viaarxiv icon

Make-A-Video: Text-to-Video Generation without Text-Video Data

Add code
Sep 29, 2022
Figure 1 for Make-A-Video: Text-to-Video Generation without Text-Video Data
Figure 2 for Make-A-Video: Text-to-Video Generation without Text-Video Data
Figure 3 for Make-A-Video: Text-to-Video Generation without Text-Video Data
Figure 4 for Make-A-Video: Text-to-Video Generation without Text-Video Data
Viaarxiv icon

MUGEN: A Playground for Video-Audio-Text Multimodal Understanding and GENeration

Add code
Apr 28, 2022
Figure 1 for MUGEN: A Playground for Video-Audio-Text Multimodal Understanding and GENeration
Figure 2 for MUGEN: A Playground for Video-Audio-Text Multimodal Understanding and GENeration
Figure 3 for MUGEN: A Playground for Video-Audio-Text Multimodal Understanding and GENeration
Figure 4 for MUGEN: A Playground for Video-Audio-Text Multimodal Understanding and GENeration
Viaarxiv icon

Long Video Generation with Time-Agnostic VQGAN and Time-Sensitive Transformer

Add code
Apr 07, 2022
Figure 1 for Long Video Generation with Time-Agnostic VQGAN and Time-Sensitive Transformer
Figure 2 for Long Video Generation with Time-Agnostic VQGAN and Time-Sensitive Transformer
Figure 3 for Long Video Generation with Time-Agnostic VQGAN and Time-Sensitive Transformer
Figure 4 for Long Video Generation with Time-Agnostic VQGAN and Time-Sensitive Transformer
Viaarxiv icon

Proactive Image Manipulation Detection

Add code
Mar 31, 2022
Figure 1 for Proactive Image Manipulation Detection
Figure 2 for Proactive Image Manipulation Detection
Figure 3 for Proactive Image Manipulation Detection
Figure 4 for Proactive Image Manipulation Detection
Viaarxiv icon