Alert button

"Text": models, code, and papers
Alert button

U-Style: Cascading U-nets with Multi-level Speaker and Style Modeling for Zero-Shot Voice Cloning

Oct 06, 2023
Tao Li, Zhichao Wang, Xinfa Zhu, Jian Cong, Qiao Tian, Yuping Wang, Lei Xie

Viaarxiv icon

CCEdit: Creative and Controllable Video Editing via Diffusion Models

Sep 28, 2023
Ruoyu Feng, Wenming Weng, Yanhui Wang, Yuhui Yuan, Jianmin Bao, Chong Luo, Zhibo Chen, Baining Guo

Figure 1 for CCEdit: Creative and Controllable Video Editing via Diffusion Models
Figure 2 for CCEdit: Creative and Controllable Video Editing via Diffusion Models
Figure 3 for CCEdit: Creative and Controllable Video Editing via Diffusion Models
Figure 4 for CCEdit: Creative and Controllable Video Editing via Diffusion Models
Viaarxiv icon

Intriguing properties of generative classifiers

Sep 28, 2023
Priyank Jaini, Kevin Clark, Robert Geirhos

Figure 1 for Intriguing properties of generative classifiers
Figure 2 for Intriguing properties of generative classifiers
Figure 3 for Intriguing properties of generative classifiers
Figure 4 for Intriguing properties of generative classifiers
Viaarxiv icon

Detecting and Grounding Multi-Modal Media Manipulation and Beyond

Sep 25, 2023
Rui Shao, Tianxing Wu, Jianlong Wu, Liqiang Nie, Ziwei Liu

Figure 1 for Detecting and Grounding Multi-Modal Media Manipulation and Beyond
Figure 2 for Detecting and Grounding Multi-Modal Media Manipulation and Beyond
Figure 3 for Detecting and Grounding Multi-Modal Media Manipulation and Beyond
Figure 4 for Detecting and Grounding Multi-Modal Media Manipulation and Beyond
Viaarxiv icon

Many-to-Many Spoken Language Translation via Unified Speech and Text Representation Learning with Unit-to-Unit Translation

Aug 03, 2023
Minsu Kim, Jeongsoo Choi, Dahun Kim, Yong Man Ro

Figure 1 for Many-to-Many Spoken Language Translation via Unified Speech and Text Representation Learning with Unit-to-Unit Translation
Figure 2 for Many-to-Many Spoken Language Translation via Unified Speech and Text Representation Learning with Unit-to-Unit Translation
Figure 3 for Many-to-Many Spoken Language Translation via Unified Speech and Text Representation Learning with Unit-to-Unit Translation
Figure 4 for Many-to-Many Spoken Language Translation via Unified Speech and Text Representation Learning with Unit-to-Unit Translation
Viaarxiv icon

American Stories: A Large-Scale Structured Text Dataset of Historical U.S. Newspapers

Aug 24, 2023
Melissa Dell, Jacob Carlson, Tom Bryan, Emily Silcock, Abhishek Arora, Zejiang Shen, Luca D'Amico-Wong, Quan Le, Pablo Querubin, Leander Heldring

Figure 1 for American Stories: A Large-Scale Structured Text Dataset of Historical U.S. Newspapers
Figure 2 for American Stories: A Large-Scale Structured Text Dataset of Historical U.S. Newspapers
Figure 3 for American Stories: A Large-Scale Structured Text Dataset of Historical U.S. Newspapers
Figure 4 for American Stories: A Large-Scale Structured Text Dataset of Historical U.S. Newspapers
Viaarxiv icon

Multi-Modal Automatic Prosody Annotation with Contrastive Pretraining of SSWP

Sep 11, 2023
Jinzuomu Zhong, Yang Li, Hui Huang, Jie Liu, Zhiba Su, Jing Guo, Benlai Tang, Fengjie Zhu

Figure 1 for Multi-Modal Automatic Prosody Annotation with Contrastive Pretraining of SSWP
Figure 2 for Multi-Modal Automatic Prosody Annotation with Contrastive Pretraining of SSWP
Figure 3 for Multi-Modal Automatic Prosody Annotation with Contrastive Pretraining of SSWP
Figure 4 for Multi-Modal Automatic Prosody Annotation with Contrastive Pretraining of SSWP
Viaarxiv icon

BiSinger: Bilingual Singing Voice Synthesis

Sep 29, 2023
Huali Zhou, Yueqian Lin, Yao Shi, Peng Sun, Ming Li

Figure 1 for BiSinger: Bilingual Singing Voice Synthesis
Figure 2 for BiSinger: Bilingual Singing Voice Synthesis
Figure 3 for BiSinger: Bilingual Singing Voice Synthesis
Figure 4 for BiSinger: Bilingual Singing Voice Synthesis
Viaarxiv icon

Music Understanding LLaMA: Advancing Text-to-Music Generation with Question Answering and Captioning

Aug 22, 2023
Shansong Liu, Atin Sakkeer Hussain, Chenshuo Sun, Ying Shan

Figure 1 for Music Understanding LLaMA: Advancing Text-to-Music Generation with Question Answering and Captioning
Figure 2 for Music Understanding LLaMA: Advancing Text-to-Music Generation with Question Answering and Captioning
Figure 3 for Music Understanding LLaMA: Advancing Text-to-Music Generation with Question Answering and Captioning
Figure 4 for Music Understanding LLaMA: Advancing Text-to-Music Generation with Question Answering and Captioning
Viaarxiv icon

Enable Language Models to Implicitly Learn Self-Improvement From Data

Oct 05, 2023
Ziqi Wang, Le Hou, Tianjian Lu, Yuexin Wu, Yunxuan Li, Hongkun Yu, Heng Ji

Figure 1 for Enable Language Models to Implicitly Learn Self-Improvement From Data
Figure 2 for Enable Language Models to Implicitly Learn Self-Improvement From Data
Figure 3 for Enable Language Models to Implicitly Learn Self-Improvement From Data
Figure 4 for Enable Language Models to Implicitly Learn Self-Improvement From Data
Viaarxiv icon