Alert button
Picture for Rui Qian

Rui Qian

Alert button

SongComposer: A Large Language Model for Lyric and Melody Composition in Song Generation

Feb 27, 2024
Shuangrui Ding, Zihan Liu, Xiaoyi Dong, Pan Zhang, Rui Qian, Conghui He, Dahua Lin, Jiaqi Wang

Viaarxiv icon

VideoPrism: A Foundational Visual Encoder for Video Understanding

Feb 20, 2024
Long Zhao, Nitesh B. Gundavarapu, Liangzhe Yuan, Hao Zhou, Shen Yan, Jennifer J. Sun, Luke Friedman, Rui Qian, Tobias Weyand, Yue Zhao, Rachel Hornung, Florian Schroff, Ming-Hsuan Yang, David A. Ross, Huisheng Wang, Hartwig Adam, Mikhail Sirotenko, Ting Liu, Boqing Gong

Viaarxiv icon

Betrayed by Attention: A Simple yet Effective Approach for Self-supervised Video Object Segmentation

Nov 29, 2023
Shuangrui Ding, Rui Qian, Haohang Xu, Dahua Lin, Hongkai Xiong

Viaarxiv icon

Semantics Meets Temporal Correspondence: Self-supervised Object-centric Learning in Videos

Aug 19, 2023
Rui Qian, Shuangrui Ding, Xian Liu, Dahua Lin

Figure 1 for Semantics Meets Temporal Correspondence: Self-supervised Object-centric Learning in Videos
Figure 2 for Semantics Meets Temporal Correspondence: Self-supervised Object-centric Learning in Videos
Figure 3 for Semantics Meets Temporal Correspondence: Self-supervised Object-centric Learning in Videos
Figure 4 for Semantics Meets Temporal Correspondence: Self-supervised Object-centric Learning in Videos
Viaarxiv icon

Prune Spatio-temporal Tokens by Semantic-aware Temporal Accumulation

Aug 08, 2023
Shuangrui Ding, Peisen Zhao, Xiaopeng Zhang, Rui Qian, Hongkai Xiong, Qi Tian

Figure 1 for Prune Spatio-temporal Tokens by Semantic-aware Temporal Accumulation
Figure 2 for Prune Spatio-temporal Tokens by Semantic-aware Temporal Accumulation
Figure 3 for Prune Spatio-temporal Tokens by Semantic-aware Temporal Accumulation
Figure 4 for Prune Spatio-temporal Tokens by Semantic-aware Temporal Accumulation
Viaarxiv icon

Taming Diffusion Models for Audio-Driven Co-Speech Gesture Generation

Mar 18, 2023
Lingting Zhu, Xian Liu, Xuanyu Liu, Rui Qian, Ziwei Liu, Lequan Yu

Figure 1 for Taming Diffusion Models for Audio-Driven Co-Speech Gesture Generation
Figure 2 for Taming Diffusion Models for Audio-Driven Co-Speech Gesture Generation
Figure 3 for Taming Diffusion Models for Audio-Driven Co-Speech Gesture Generation
Figure 4 for Taming Diffusion Models for Audio-Driven Co-Speech Gesture Generation
Viaarxiv icon

Motion-inductive Self-supervised Object Discovery in Videos

Oct 01, 2022
Shuangrui Ding, Weidi Xie, Yabo Chen, Rui Qian, Xiaopeng Zhang, Hongkai Xiong, Qi Tian

Figure 1 for Motion-inductive Self-supervised Object Discovery in Videos
Figure 2 for Motion-inductive Self-supervised Object Discovery in Videos
Figure 3 for Motion-inductive Self-supervised Object Discovery in Videos
Figure 4 for Motion-inductive Self-supervised Object Discovery in Videos
Viaarxiv icon

Static and Dynamic Concepts for Self-supervised Video Representation Learning

Jul 26, 2022
Rui Qian, Shuangrui Ding, Xian Liu, Dahua Lin

Figure 1 for Static and Dynamic Concepts for Self-supervised Video Representation Learning
Figure 2 for Static and Dynamic Concepts for Self-supervised Video Representation Learning
Figure 3 for Static and Dynamic Concepts for Self-supervised Video Representation Learning
Figure 4 for Static and Dynamic Concepts for Self-supervised Video Representation Learning
Viaarxiv icon

Exploring Fine-Grained Audiovisual Categorization with the SSW60 Dataset

Jul 21, 2022
Grant Van Horn, Rui Qian, Kimberly Wilber, Hartwig Adam, Oisin Mac Aodha, Serge Belongie

Figure 1 for Exploring Fine-Grained Audiovisual Categorization with the SSW60 Dataset
Figure 2 for Exploring Fine-Grained Audiovisual Categorization with the SSW60 Dataset
Figure 3 for Exploring Fine-Grained Audiovisual Categorization with the SSW60 Dataset
Figure 4 for Exploring Fine-Grained Audiovisual Categorization with the SSW60 Dataset
Viaarxiv icon

Multimodal Open-Vocabulary Video Classification via Pre-Trained Vision and Language Models

Jul 15, 2022
Rui Qian, Yeqing Li, Zheng Xu, Ming-Hsuan Yang, Serge Belongie, Yin Cui

Figure 1 for Multimodal Open-Vocabulary Video Classification via Pre-Trained Vision and Language Models
Figure 2 for Multimodal Open-Vocabulary Video Classification via Pre-Trained Vision and Language Models
Figure 3 for Multimodal Open-Vocabulary Video Classification via Pre-Trained Vision and Language Models
Figure 4 for Multimodal Open-Vocabulary Video Classification via Pre-Trained Vision and Language Models
Viaarxiv icon