Alert button

"Text": models, code, and papers
Alert button

Period VITS: Variational Inference with Explicit Pitch Modeling for End-to-end Emotional Speech Synthesis

Oct 28, 2022
Yuma Shirahata, Ryuichi Yamamoto, Eunwoo Song, Ryo Terashima, Jae-Min Kim, Kentaro Tachibana

Figure 1 for Period VITS: Variational Inference with Explicit Pitch Modeling for End-to-end Emotional Speech Synthesis
Figure 2 for Period VITS: Variational Inference with Explicit Pitch Modeling for End-to-end Emotional Speech Synthesis
Figure 3 for Period VITS: Variational Inference with Explicit Pitch Modeling for End-to-end Emotional Speech Synthesis
Figure 4 for Period VITS: Variational Inference with Explicit Pitch Modeling for End-to-end Emotional Speech Synthesis
Viaarxiv icon

A Visual Tour Of Current Challenges In Multimodal Language Models

Oct 22, 2022
Shashank Sonkar, Naiming Liu, Richard G. Baraniuk

Figure 1 for A Visual Tour Of Current Challenges In Multimodal Language Models
Figure 2 for A Visual Tour Of Current Challenges In Multimodal Language Models
Figure 3 for A Visual Tour Of Current Challenges In Multimodal Language Models
Figure 4 for A Visual Tour Of Current Challenges In Multimodal Language Models
Viaarxiv icon

Integrating Heterogeneous Domain Information into Relation Extraction: A Case Study on Drug-Drug Interaction Extraction

Dec 21, 2022
Masaki Asada

Figure 1 for Integrating Heterogeneous Domain Information into Relation Extraction: A Case Study on Drug-Drug Interaction Extraction
Figure 2 for Integrating Heterogeneous Domain Information into Relation Extraction: A Case Study on Drug-Drug Interaction Extraction
Figure 3 for Integrating Heterogeneous Domain Information into Relation Extraction: A Case Study on Drug-Drug Interaction Extraction
Figure 4 for Integrating Heterogeneous Domain Information into Relation Extraction: A Case Study on Drug-Drug Interaction Extraction
Viaarxiv icon

Analyzing Semantic Faithfulness of Language Models via Input Intervention on Conversational Question Answering

Dec 21, 2022
Akshay Chaturvedi, Swarnadeep Bhar, Soumadeep Saha, Utpal Garain, Nicholas Asher

Figure 1 for Analyzing Semantic Faithfulness of Language Models via Input Intervention on Conversational Question Answering
Figure 2 for Analyzing Semantic Faithfulness of Language Models via Input Intervention on Conversational Question Answering
Figure 3 for Analyzing Semantic Faithfulness of Language Models via Input Intervention on Conversational Question Answering
Figure 4 for Analyzing Semantic Faithfulness of Language Models via Input Intervention on Conversational Question Answering
Viaarxiv icon

SMAUG: Sparse Masked Autoencoder for Efficient Video-Language Pre-training

Nov 30, 2022
Yuanze Lin, Chen Wei, Huiyu Wang, Alan Yuille, Cihang Xie

Figure 1 for SMAUG: Sparse Masked Autoencoder for Efficient Video-Language Pre-training
Figure 2 for SMAUG: Sparse Masked Autoencoder for Efficient Video-Language Pre-training
Figure 3 for SMAUG: Sparse Masked Autoencoder for Efficient Video-Language Pre-training
Figure 4 for SMAUG: Sparse Masked Autoencoder for Efficient Video-Language Pre-training
Viaarxiv icon

Improving Cross-Modal Retrieval with Set of Diverse Embeddings

Nov 30, 2022
Dongwon Kim, Namyup Kim, Suha Kwak

Figure 1 for Improving Cross-Modal Retrieval with Set of Diverse Embeddings
Figure 2 for Improving Cross-Modal Retrieval with Set of Diverse Embeddings
Figure 3 for Improving Cross-Modal Retrieval with Set of Diverse Embeddings
Figure 4 for Improving Cross-Modal Retrieval with Set of Diverse Embeddings
Viaarxiv icon

Towards Open-Set Text Recognition via Label-to-Prototype Learning

Mar 10, 2022
Chang Liu, Chun Yang, Hai-Bo Qin, Xiaobin Zhu, JieBo Hou, Xu-Cheng Yin

Figure 1 for Towards Open-Set Text Recognition via Label-to-Prototype Learning
Figure 2 for Towards Open-Set Text Recognition via Label-to-Prototype Learning
Figure 3 for Towards Open-Set Text Recognition via Label-to-Prototype Learning
Figure 4 for Towards Open-Set Text Recognition via Label-to-Prototype Learning
Viaarxiv icon

NoreSpeech: Knowledge Distillation based Conditional Diffusion Model for Noise-robust Expressive TTS

Nov 04, 2022
Dongchao Yang, Songxiang Liu, Jianwei Yu, Helin Wang, Chao Weng, Yuexian Zou

Figure 1 for NoreSpeech: Knowledge Distillation based Conditional Diffusion Model for Noise-robust Expressive TTS
Figure 2 for NoreSpeech: Knowledge Distillation based Conditional Diffusion Model for Noise-robust Expressive TTS
Figure 3 for NoreSpeech: Knowledge Distillation based Conditional Diffusion Model for Noise-robust Expressive TTS
Viaarxiv icon

OSIC: A New One-Stage Image Captioner Coined

Nov 04, 2022
Bo Wang, Zhao Zhang, Mingbo Zhao, Xiaojie Jin, Mingliang Xu, Meng Wang

Figure 1 for OSIC: A New One-Stage Image Captioner Coined
Figure 2 for OSIC: A New One-Stage Image Captioner Coined
Figure 3 for OSIC: A New One-Stage Image Captioner Coined
Figure 4 for OSIC: A New One-Stage Image Captioner Coined
Viaarxiv icon

Integrating Text Inputs For Training and Adapting RNN Transducer ASR Models

Feb 26, 2022
Samuel Thomas, Brian Kingsbury, George Saon, Hong-Kwang J. Kuo

Figure 1 for Integrating Text Inputs For Training and Adapting RNN Transducer ASR Models
Figure 2 for Integrating Text Inputs For Training and Adapting RNN Transducer ASR Models
Figure 3 for Integrating Text Inputs For Training and Adapting RNN Transducer ASR Models
Figure 4 for Integrating Text Inputs For Training and Adapting RNN Transducer ASR Models
Viaarxiv icon