Picture for Rui Qian

Rui Qian

SongComposer: A Large Language Model for Lyric and Melody Composition in Song Generation

Add code
Feb 27, 2024
Figure 1 for SongComposer: A Large Language Model for Lyric and Melody Composition in Song Generation
Figure 2 for SongComposer: A Large Language Model for Lyric and Melody Composition in Song Generation
Figure 3 for SongComposer: A Large Language Model for Lyric and Melody Composition in Song Generation
Figure 4 for SongComposer: A Large Language Model for Lyric and Melody Composition in Song Generation
Viaarxiv icon

VideoPrism: A Foundational Visual Encoder for Video Understanding

Add code
Feb 20, 2024
Figure 1 for VideoPrism: A Foundational Visual Encoder for Video Understanding
Figure 2 for VideoPrism: A Foundational Visual Encoder for Video Understanding
Figure 3 for VideoPrism: A Foundational Visual Encoder for Video Understanding
Figure 4 for VideoPrism: A Foundational Visual Encoder for Video Understanding
Viaarxiv icon

Betrayed by Attention: A Simple yet Effective Approach for Self-supervised Video Object Segmentation

Add code
Nov 29, 2023
Figure 1 for Betrayed by Attention: A Simple yet Effective Approach for Self-supervised Video Object Segmentation
Figure 2 for Betrayed by Attention: A Simple yet Effective Approach for Self-supervised Video Object Segmentation
Figure 3 for Betrayed by Attention: A Simple yet Effective Approach for Self-supervised Video Object Segmentation
Figure 4 for Betrayed by Attention: A Simple yet Effective Approach for Self-supervised Video Object Segmentation
Viaarxiv icon

Semantics Meets Temporal Correspondence: Self-supervised Object-centric Learning in Videos

Add code
Aug 19, 2023
Viaarxiv icon

Prune Spatio-temporal Tokens by Semantic-aware Temporal Accumulation

Add code
Aug 08, 2023
Viaarxiv icon

Taming Diffusion Models for Audio-Driven Co-Speech Gesture Generation

Add code
Mar 18, 2023
Viaarxiv icon

Motion-inductive Self-supervised Object Discovery in Videos

Add code
Oct 01, 2022
Figure 1 for Motion-inductive Self-supervised Object Discovery in Videos
Figure 2 for Motion-inductive Self-supervised Object Discovery in Videos
Figure 3 for Motion-inductive Self-supervised Object Discovery in Videos
Figure 4 for Motion-inductive Self-supervised Object Discovery in Videos
Viaarxiv icon

Static and Dynamic Concepts for Self-supervised Video Representation Learning

Add code
Jul 26, 2022
Figure 1 for Static and Dynamic Concepts for Self-supervised Video Representation Learning
Figure 2 for Static and Dynamic Concepts for Self-supervised Video Representation Learning
Figure 3 for Static and Dynamic Concepts for Self-supervised Video Representation Learning
Figure 4 for Static and Dynamic Concepts for Self-supervised Video Representation Learning
Viaarxiv icon

Exploring Fine-Grained Audiovisual Categorization with the SSW60 Dataset

Add code
Jul 21, 2022
Figure 1 for Exploring Fine-Grained Audiovisual Categorization with the SSW60 Dataset
Figure 2 for Exploring Fine-Grained Audiovisual Categorization with the SSW60 Dataset
Figure 3 for Exploring Fine-Grained Audiovisual Categorization with the SSW60 Dataset
Figure 4 for Exploring Fine-Grained Audiovisual Categorization with the SSW60 Dataset
Viaarxiv icon

Multimodal Open-Vocabulary Video Classification via Pre-Trained Vision and Language Models

Add code
Jul 15, 2022
Figure 1 for Multimodal Open-Vocabulary Video Classification via Pre-Trained Vision and Language Models
Figure 2 for Multimodal Open-Vocabulary Video Classification via Pre-Trained Vision and Language Models
Figure 3 for Multimodal Open-Vocabulary Video Classification via Pre-Trained Vision and Language Models
Figure 4 for Multimodal Open-Vocabulary Video Classification via Pre-Trained Vision and Language Models
Viaarxiv icon