Alert button
Picture for Yuchong Sun

Yuchong Sun

Alert button

Parrot: Enhancing Multi-Turn Chat Models by Learning to Ask Questions

Oct 11, 2023
Yuchong Sun, Che Liu, Jinwen Huang, Ruihua Song, Fuzheng Zhang, Di Zhang, Zhongyuan Wang, Kun Gai

Figure 1 for Parrot: Enhancing Multi-Turn Chat Models by Learning to Ask Questions
Figure 2 for Parrot: Enhancing Multi-Turn Chat Models by Learning to Ask Questions
Figure 3 for Parrot: Enhancing Multi-Turn Chat Models by Learning to Ask Questions
Figure 4 for Parrot: Enhancing Multi-Turn Chat Models by Learning to Ask Questions
Viaarxiv icon

ViCo: Engaging Video Comment Generation with Human Preference Rewards

Aug 22, 2023
Yuchong Sun, Bei Liu, Xu Chen, Ruihua Song, Jianlong Fu

Figure 1 for ViCo: Engaging Video Comment Generation with Human Preference Rewards
Figure 2 for ViCo: Engaging Video Comment Generation with Human Preference Rewards
Figure 3 for ViCo: Engaging Video Comment Generation with Human Preference Rewards
Figure 4 for ViCo: Engaging Video Comment Generation with Human Preference Rewards
Viaarxiv icon

Translating Text Synopses to Video Storyboards

Dec 31, 2022
Xu Gu, Yuchong Sun, Feiyue Ni, Shizhe Chen, Ruihua Song, Boyuan Li, Xiang Cao

Figure 1 for Translating Text Synopses to Video Storyboards
Figure 2 for Translating Text Synopses to Video Storyboards
Figure 3 for Translating Text Synopses to Video Storyboards
Figure 4 for Translating Text Synopses to Video Storyboards
Viaarxiv icon

Long-Form Video-Language Pre-Training with Multimodal Temporal Contrastive Learning

Oct 12, 2022
Yuchong Sun, Hongwei Xue, Ruihua Song, Bei Liu, Huan Yang, Jianlong Fu

Figure 1 for Long-Form Video-Language Pre-Training with Multimodal Temporal Contrastive Learning
Figure 2 for Long-Form Video-Language Pre-Training with Multimodal Temporal Contrastive Learning
Figure 3 for Long-Form Video-Language Pre-Training with Multimodal Temporal Contrastive Learning
Figure 4 for Long-Form Video-Language Pre-Training with Multimodal Temporal Contrastive Learning
Viaarxiv icon

CLIP-ViP: Adapting Pre-trained Image-Text Model to Video-Language Representation Alignment

Sep 23, 2022
Hongwei Xue, Yuchong Sun, Bei Liu, Jianlong Fu, Ruihua Song, Houqiang Li, Jiebo Luo

Figure 1 for CLIP-ViP: Adapting Pre-trained Image-Text Model to Video-Language Representation Alignment
Figure 2 for CLIP-ViP: Adapting Pre-trained Image-Text Model to Video-Language Representation Alignment
Figure 3 for CLIP-ViP: Adapting Pre-trained Image-Text Model to Video-Language Representation Alignment
Figure 4 for CLIP-ViP: Adapting Pre-trained Image-Text Model to Video-Language Representation Alignment
Viaarxiv icon

Advancing High-Resolution Video-Language Representation with Large-Scale Video Transcriptions

Nov 19, 2021
Hongwei Xue, Tiankai Hang, Yanhong Zeng, Yuchong Sun, Bei Liu, Huan Yang, Jianlong Fu, Baining Guo

Figure 1 for Advancing High-Resolution Video-Language Representation with Large-Scale Video Transcriptions
Figure 2 for Advancing High-Resolution Video-Language Representation with Large-Scale Video Transcriptions
Figure 3 for Advancing High-Resolution Video-Language Representation with Large-Scale Video Transcriptions
Figure 4 for Advancing High-Resolution Video-Language Representation with Large-Scale Video Transcriptions
Viaarxiv icon

WenLan: Bridging Vision and Language by Large-Scale Multi-Modal Pre-Training

Mar 19, 2021
Yuqi Huo, Manli Zhang, Guangzhen Liu, Haoyu Lu, Yizhao Gao, Guoxing Yang, Jingyuan Wen, Heng Zhang, Baogui Xu, Weihao Zheng, Zongzheng Xi, Yueqian Yang, Anwen Hu, Jinming Zhao, Ruichen Li, Yida Zhao, Liang Zhang, Yuqing Song, Xin Hong, Wanqing Cui, Danyang Hou, Yingyan Li, Junyi Li, Peiyu Liu, Zheng Gong, Chuhao Jin, Yuchong Sun, Shizhe Chen, Zhiwu Lu, Zhicheng Dou, Qin Jin, Yanyan Lan, Wayne Xin Zhao, Ruihua Song, Ji-Rong Wen

Figure 1 for WenLan: Bridging Vision and Language by Large-Scale Multi-Modal Pre-Training
Figure 2 for WenLan: Bridging Vision and Language by Large-Scale Multi-Modal Pre-Training
Figure 3 for WenLan: Bridging Vision and Language by Large-Scale Multi-Modal Pre-Training
Figure 4 for WenLan: Bridging Vision and Language by Large-Scale Multi-Modal Pre-Training
Viaarxiv icon