Alert button

"speech": models, code, and papers
Alert button

Speech Enhancement with Intelligent Neural Homomorphic Synthesis

Oct 28, 2022
Shulin He, Wei Rao, Jinjiang Liu, Jun Chen, Yukai Ju, Xueliang Zhang, Yannan Wang, Shidong Shang

Figure 1 for Speech Enhancement with Intelligent Neural Homomorphic Synthesis
Figure 2 for Speech Enhancement with Intelligent Neural Homomorphic Synthesis
Figure 3 for Speech Enhancement with Intelligent Neural Homomorphic Synthesis
Figure 4 for Speech Enhancement with Intelligent Neural Homomorphic Synthesis
Viaarxiv icon

Adversarial Speaker-Consistency Learning Using Untranscribed Speech Data for Zero-Shot Multi-Speaker Text-to-Speech

Add code
Bookmark button
Alert button
Oct 12, 2022
Byoung Jin Choi, Myeonghun Jeong, Minchan Kim, Sung Hwan Mun, Nam Soo Kim

Figure 1 for Adversarial Speaker-Consistency Learning Using Untranscribed Speech Data for Zero-Shot Multi-Speaker Text-to-Speech
Figure 2 for Adversarial Speaker-Consistency Learning Using Untranscribed Speech Data for Zero-Shot Multi-Speaker Text-to-Speech
Figure 3 for Adversarial Speaker-Consistency Learning Using Untranscribed Speech Data for Zero-Shot Multi-Speaker Text-to-Speech
Figure 4 for Adversarial Speaker-Consistency Learning Using Untranscribed Speech Data for Zero-Shot Multi-Speaker Text-to-Speech
Viaarxiv icon

Evaluating context-invariance in unsupervised speech representations

Oct 27, 2022
Mark Hallap, Emmanuel Dupoux, Ewan Dunbar

Figure 1 for Evaluating context-invariance in unsupervised speech representations
Figure 2 for Evaluating context-invariance in unsupervised speech representations
Figure 3 for Evaluating context-invariance in unsupervised speech representations
Figure 4 for Evaluating context-invariance in unsupervised speech representations
Viaarxiv icon

Dynamic Speech Endpoint Detection with Regression Targets

Oct 25, 2022
Dawei Liang, Hang Su, Tarun Singh, Jay Mahadeokar, Shanil Puri, Jiedan Zhu, Edison Thomaz, Mike Seltzer

Figure 1 for Dynamic Speech Endpoint Detection with Regression Targets
Figure 2 for Dynamic Speech Endpoint Detection with Regression Targets
Figure 3 for Dynamic Speech Endpoint Detection with Regression Targets
Figure 4 for Dynamic Speech Endpoint Detection with Regression Targets
Viaarxiv icon

Supplementary Features of BiLSTM for Enhanced Sequence Labeling

Add code
Bookmark button
Alert button
Jun 08, 2023
Conglei Xu, Kun Shen, Hongguang Sun

Figure 1 for Supplementary Features of BiLSTM for Enhanced Sequence Labeling
Figure 2 for Supplementary Features of BiLSTM for Enhanced Sequence Labeling
Figure 3 for Supplementary Features of BiLSTM for Enhanced Sequence Labeling
Figure 4 for Supplementary Features of BiLSTM for Enhanced Sequence Labeling
Viaarxiv icon

When the Majority is Wrong: Leveraging Annotator Disagreement for Subjective Tasks

May 11, 2023
Eve Fleisig, Rediet Abebe, Dan Klein

Figure 1 for When the Majority is Wrong: Leveraging Annotator Disagreement for Subjective Tasks
Figure 2 for When the Majority is Wrong: Leveraging Annotator Disagreement for Subjective Tasks
Figure 3 for When the Majority is Wrong: Leveraging Annotator Disagreement for Subjective Tasks
Figure 4 for When the Majority is Wrong: Leveraging Annotator Disagreement for Subjective Tasks
Viaarxiv icon

Continuous Emotional Intensity Controllable Speech Synthesis using Semi-supervised Learning

Nov 11, 2022
Yoori Oh, Juheon Lee, Yoseob Han, Kyogu Lee

Figure 1 for Continuous Emotional Intensity Controllable Speech Synthesis using Semi-supervised Learning
Figure 2 for Continuous Emotional Intensity Controllable Speech Synthesis using Semi-supervised Learning
Figure 3 for Continuous Emotional Intensity Controllable Speech Synthesis using Semi-supervised Learning
Figure 4 for Continuous Emotional Intensity Controllable Speech Synthesis using Semi-supervised Learning
Viaarxiv icon

Looking Similar, Sounding Different: Leveraging Counterfactual Cross-Modal Pairs for Audiovisual Representation Learning

Apr 12, 2023
Nikhil Singh, Chih-Wei Wu, Iroro Orife, Mahdi Kalayeh

Figure 1 for Looking Similar, Sounding Different: Leveraging Counterfactual Cross-Modal Pairs for Audiovisual Representation Learning
Figure 2 for Looking Similar, Sounding Different: Leveraging Counterfactual Cross-Modal Pairs for Audiovisual Representation Learning
Figure 3 for Looking Similar, Sounding Different: Leveraging Counterfactual Cross-Modal Pairs for Audiovisual Representation Learning
Figure 4 for Looking Similar, Sounding Different: Leveraging Counterfactual Cross-Modal Pairs for Audiovisual Representation Learning
Viaarxiv icon

Tensor decomposition for minimization of E2E SLU model toward on-device processing

Jun 02, 2023
Yosuke Kashiwagi, Siddhant Arora, Hayato Futami, Jessica Huynh, Shih-Lun Wu, Yifan Peng, Brian Yan, Emiru Tsunoo, Shinji Watanabe

Figure 1 for Tensor decomposition for minimization of E2E SLU model toward on-device processing
Figure 2 for Tensor decomposition for minimization of E2E SLU model toward on-device processing
Figure 3 for Tensor decomposition for minimization of E2E SLU model toward on-device processing
Figure 4 for Tensor decomposition for minimization of E2E SLU model toward on-device processing
Viaarxiv icon

Imitator: Personalized Speech-driven 3D Facial Animation

Add code
Bookmark button
Alert button
Dec 30, 2022
Balamurugan Thambiraja, Ikhsanul Habibie, Sadegh Aliakbarian, Darren Cosker, Christian Theobalt, Justus Thies

Figure 1 for Imitator: Personalized Speech-driven 3D Facial Animation
Figure 2 for Imitator: Personalized Speech-driven 3D Facial Animation
Figure 3 for Imitator: Personalized Speech-driven 3D Facial Animation
Figure 4 for Imitator: Personalized Speech-driven 3D Facial Animation
Viaarxiv icon