Picture for Yuchong Sun

Yuchong Sun

Parrot: Enhancing Multi-Turn Chat Models by Learning to Ask Questions

Add code
Oct 11, 2023
Figure 1 for Parrot: Enhancing Multi-Turn Chat Models by Learning to Ask Questions
Figure 2 for Parrot: Enhancing Multi-Turn Chat Models by Learning to Ask Questions
Figure 3 for Parrot: Enhancing Multi-Turn Chat Models by Learning to Ask Questions
Figure 4 for Parrot: Enhancing Multi-Turn Chat Models by Learning to Ask Questions
Viaarxiv icon

ViCo: Engaging Video Comment Generation with Human Preference Rewards

Add code
Aug 22, 2023
Figure 1 for ViCo: Engaging Video Comment Generation with Human Preference Rewards
Figure 2 for ViCo: Engaging Video Comment Generation with Human Preference Rewards
Figure 3 for ViCo: Engaging Video Comment Generation with Human Preference Rewards
Figure 4 for ViCo: Engaging Video Comment Generation with Human Preference Rewards
Viaarxiv icon

Translating Text Synopses to Video Storyboards

Add code
Dec 31, 2022
Figure 1 for Translating Text Synopses to Video Storyboards
Figure 2 for Translating Text Synopses to Video Storyboards
Figure 3 for Translating Text Synopses to Video Storyboards
Figure 4 for Translating Text Synopses to Video Storyboards
Viaarxiv icon

Long-Form Video-Language Pre-Training with Multimodal Temporal Contrastive Learning

Add code
Oct 12, 2022
Figure 1 for Long-Form Video-Language Pre-Training with Multimodal Temporal Contrastive Learning
Figure 2 for Long-Form Video-Language Pre-Training with Multimodal Temporal Contrastive Learning
Figure 3 for Long-Form Video-Language Pre-Training with Multimodal Temporal Contrastive Learning
Figure 4 for Long-Form Video-Language Pre-Training with Multimodal Temporal Contrastive Learning
Viaarxiv icon

CLIP-ViP: Adapting Pre-trained Image-Text Model to Video-Language Representation Alignment

Add code
Sep 23, 2022
Figure 1 for CLIP-ViP: Adapting Pre-trained Image-Text Model to Video-Language Representation Alignment
Figure 2 for CLIP-ViP: Adapting Pre-trained Image-Text Model to Video-Language Representation Alignment
Figure 3 for CLIP-ViP: Adapting Pre-trained Image-Text Model to Video-Language Representation Alignment
Figure 4 for CLIP-ViP: Adapting Pre-trained Image-Text Model to Video-Language Representation Alignment
Viaarxiv icon

Advancing High-Resolution Video-Language Representation with Large-Scale Video Transcriptions

Add code
Nov 19, 2021
Figure 1 for Advancing High-Resolution Video-Language Representation with Large-Scale Video Transcriptions
Figure 2 for Advancing High-Resolution Video-Language Representation with Large-Scale Video Transcriptions
Figure 3 for Advancing High-Resolution Video-Language Representation with Large-Scale Video Transcriptions
Figure 4 for Advancing High-Resolution Video-Language Representation with Large-Scale Video Transcriptions
Viaarxiv icon

WenLan: Bridging Vision and Language by Large-Scale Multi-Modal Pre-Training

Add code
Mar 19, 2021
Figure 1 for WenLan: Bridging Vision and Language by Large-Scale Multi-Modal Pre-Training
Figure 2 for WenLan: Bridging Vision and Language by Large-Scale Multi-Modal Pre-Training
Figure 3 for WenLan: Bridging Vision and Language by Large-Scale Multi-Modal Pre-Training
Figure 4 for WenLan: Bridging Vision and Language by Large-Scale Multi-Modal Pre-Training
Viaarxiv icon