Alert button

"speech": models, code, and papers
Alert button

Unifying Speech Enhancement and Separation with Gradient Modulation for End-to-End Noise-Robust Speech Separation

Add code
Bookmark button
Alert button
Feb 22, 2023
Yuchen Hu, Chen Chen, Heqing Zou, Xionghu Zhong, Eng Siong Chng

Figure 1 for Unifying Speech Enhancement and Separation with Gradient Modulation for End-to-End Noise-Robust Speech Separation
Figure 2 for Unifying Speech Enhancement and Separation with Gradient Modulation for End-to-End Noise-Robust Speech Separation
Figure 3 for Unifying Speech Enhancement and Separation with Gradient Modulation for End-to-End Noise-Robust Speech Separation
Figure 4 for Unifying Speech Enhancement and Separation with Gradient Modulation for End-to-End Noise-Robust Speech Separation
Viaarxiv icon

Reprogramming Audio-driven Talking Face Synthesis into Text-driven

Jun 28, 2023
Jeongsoo Choi, Minsu Kim, Se Jin Park, Yong Man Ro

Figure 1 for Reprogramming Audio-driven Talking Face Synthesis into Text-driven
Figure 2 for Reprogramming Audio-driven Talking Face Synthesis into Text-driven
Figure 3 for Reprogramming Audio-driven Talking Face Synthesis into Text-driven
Figure 4 for Reprogramming Audio-driven Talking Face Synthesis into Text-driven
Viaarxiv icon

A Multi-dimensional Deep Structured State Space Approach to Speech Enhancement Using Small-footprint Models

Add code
Bookmark button
Alert button
Jun 01, 2023
Pin-Jui Ku, Chao-Han Huck Yang, Sabato Marco Siniscalchi, Chin-Hui Lee

Figure 1 for A Multi-dimensional Deep Structured State Space Approach to Speech Enhancement Using Small-footprint Models
Figure 2 for A Multi-dimensional Deep Structured State Space Approach to Speech Enhancement Using Small-footprint Models
Figure 3 for A Multi-dimensional Deep Structured State Space Approach to Speech Enhancement Using Small-footprint Models
Figure 4 for A Multi-dimensional Deep Structured State Space Approach to Speech Enhancement Using Small-footprint Models
Viaarxiv icon

Cross-Attribute Matrix Factorization Model with Shared User Embedding

Aug 14, 2023
Wen Liang, Zeng Fan, Youzhi Liang, Jianguo Jia

Figure 1 for Cross-Attribute Matrix Factorization Model with Shared User Embedding
Figure 2 for Cross-Attribute Matrix Factorization Model with Shared User Embedding
Figure 3 for Cross-Attribute Matrix Factorization Model with Shared User Embedding
Figure 4 for Cross-Attribute Matrix Factorization Model with Shared User Embedding
Viaarxiv icon

Acoustic absement in detail: Quantifying acoustic differences across time-series representations of speech data

Add code
Bookmark button
Alert button
Apr 14, 2023
Matthew C. Kelley

Figure 1 for Acoustic absement in detail: Quantifying acoustic differences across time-series representations of speech data
Viaarxiv icon

ClArTTS: An Open-Source Classical Arabic Text-to-Speech Corpus

Feb 28, 2023
Ajinkya Kulkarni, Atharva Kulkarni, Sara Abedalmonem Mohammad Shatnawi, Hanan Aldarmaki

Figure 1 for ClArTTS: An Open-Source Classical Arabic Text-to-Speech Corpus
Figure 2 for ClArTTS: An Open-Source Classical Arabic Text-to-Speech Corpus
Figure 3 for ClArTTS: An Open-Source Classical Arabic Text-to-Speech Corpus
Figure 4 for ClArTTS: An Open-Source Classical Arabic Text-to-Speech Corpus
Viaarxiv icon

SynthVSR: Scaling Up Visual Speech Recognition With Synthetic Supervision

Apr 03, 2023
Xubo Liu, Egor Lakomkin, Konstantinos Vougioukas, Pingchuan Ma, Honglie Chen, Ruiming Xie, Morrie Doulaty, Niko Moritz, Jáchym Kolář, Stavros Petridis, Maja Pantic, Christian Fuegen

Figure 1 for SynthVSR: Scaling Up Visual Speech Recognition With Synthetic Supervision
Figure 2 for SynthVSR: Scaling Up Visual Speech Recognition With Synthetic Supervision
Figure 3 for SynthVSR: Scaling Up Visual Speech Recognition With Synthetic Supervision
Figure 4 for SynthVSR: Scaling Up Visual Speech Recognition With Synthetic Supervision
Viaarxiv icon

Improving Frame-level Classifier for Word Timings with Non-peaky CTC in End-to-End Automatic Speech Recognition

Add code
Bookmark button
Alert button
Jun 09, 2023
Xianzhao Chen, Yist Y. Lin, Kang Wang, Yi He, Zejun Ma

Figure 1 for Improving Frame-level Classifier for Word Timings with Non-peaky CTC in End-to-End Automatic Speech Recognition
Figure 2 for Improving Frame-level Classifier for Word Timings with Non-peaky CTC in End-to-End Automatic Speech Recognition
Figure 3 for Improving Frame-level Classifier for Word Timings with Non-peaky CTC in End-to-End Automatic Speech Recognition
Figure 4 for Improving Frame-level Classifier for Word Timings with Non-peaky CTC in End-to-End Automatic Speech Recognition
Viaarxiv icon

LAST: Scalable Lattice-Based Speech Modelling in JAX

Add code
Bookmark button
Alert button
Apr 25, 2023
Ke Wu, Ehsan Variani, Tom Bagby, Michael Riley

Figure 1 for LAST: Scalable Lattice-Based Speech Modelling in JAX
Figure 2 for LAST: Scalable Lattice-Based Speech Modelling in JAX
Figure 3 for LAST: Scalable Lattice-Based Speech Modelling in JAX
Viaarxiv icon

Point to the Hidden: Exposing Speech Audio Splicing via Signal Pointer Nets

Aug 02, 2023
Denise Moussa, Germans Hirsch, Sebastian Wankerl, Christian Riess

Figure 1 for Point to the Hidden: Exposing Speech Audio Splicing via Signal Pointer Nets
Figure 2 for Point to the Hidden: Exposing Speech Audio Splicing via Signal Pointer Nets
Figure 3 for Point to the Hidden: Exposing Speech Audio Splicing via Signal Pointer Nets
Viaarxiv icon