Alert button

"speech": models, code, and papers
Alert button

Data Efficient Direct Speech-to-Text Translation with Modality Agnostic Meta-Learning

Nov 11, 2019
Sathish Indurthi, Houjeung Han, Nikhil Kumar Lakumarapu, Beomseok Lee, Insoo Chung, Sangha Kim, Chanwoo Kim

Figure 1 for Data Efficient Direct Speech-to-Text Translation with Modality Agnostic Meta-Learning
Figure 2 for Data Efficient Direct Speech-to-Text Translation with Modality Agnostic Meta-Learning
Figure 3 for Data Efficient Direct Speech-to-Text Translation with Modality Agnostic Meta-Learning
Figure 4 for Data Efficient Direct Speech-to-Text Translation with Modality Agnostic Meta-Learning
Viaarxiv icon

A Novel Deep Learning Architecture for Decoding Imagined Speech from EEG

Mar 19, 2020
Jerrin Thomas Panachakel, A. G. Ramakrishnan, T. V. Ananthapadmanabha

Figure 1 for A Novel Deep Learning Architecture for Decoding Imagined Speech from EEG
Figure 2 for A Novel Deep Learning Architecture for Decoding Imagined Speech from EEG
Figure 3 for A Novel Deep Learning Architecture for Decoding Imagined Speech from EEG
Viaarxiv icon

Speech Enhancement with Zero-Shot Model Selection

Dec 17, 2020
Ryandhimas E. Zezario, Chiou-Shann Fuh, Hsin-Min Wang, Yu Tsao

Figure 1 for Speech Enhancement with Zero-Shot Model Selection
Figure 2 for Speech Enhancement with Zero-Shot Model Selection
Figure 3 for Speech Enhancement with Zero-Shot Model Selection
Figure 4 for Speech Enhancement with Zero-Shot Model Selection
Viaarxiv icon

Neural Token Segmentation for High Token-Internal Complexity

Mar 21, 2022
Idan Brusilovsky, Reut Tsarfaty

Figure 1 for Neural Token Segmentation for High Token-Internal Complexity
Figure 2 for Neural Token Segmentation for High Token-Internal Complexity
Figure 3 for Neural Token Segmentation for High Token-Internal Complexity
Figure 4 for Neural Token Segmentation for High Token-Internal Complexity
Viaarxiv icon

Triple M: A Practical Neural Text-to-speech System With Multi-guidance Attention And Multi-band Multi-time Lpcnet

Jan 30, 2021
Shilun Lin, Xinhui Li, Li Lu

Figure 1 for Triple M: A Practical Neural Text-to-speech System With Multi-guidance Attention And Multi-band Multi-time Lpcnet
Figure 2 for Triple M: A Practical Neural Text-to-speech System With Multi-guidance Attention And Multi-band Multi-time Lpcnet
Figure 3 for Triple M: A Practical Neural Text-to-speech System With Multi-guidance Attention And Multi-band Multi-time Lpcnet
Figure 4 for Triple M: A Practical Neural Text-to-speech System With Multi-guidance Attention And Multi-band Multi-time Lpcnet
Viaarxiv icon

Perceptual Loss with Recognition Model for Single-Channel Enhancement and Robust ASR

Dec 11, 2021
Peter Plantinga, Deblin Bagchi, Eric Fosler-Lussier

Figure 1 for Perceptual Loss with Recognition Model for Single-Channel Enhancement and Robust ASR
Figure 2 for Perceptual Loss with Recognition Model for Single-Channel Enhancement and Robust ASR
Figure 3 for Perceptual Loss with Recognition Model for Single-Channel Enhancement and Robust ASR
Figure 4 for Perceptual Loss with Recognition Model for Single-Channel Enhancement and Robust ASR
Viaarxiv icon

Multimodal Depression Classification Using Articulatory Coordination Features And Hierarchical Attention Based Text Embeddings

Feb 13, 2022
Nadee Seneviratne, Carol Espy-Wilson

Figure 1 for Multimodal Depression Classification Using Articulatory Coordination Features And Hierarchical Attention Based Text Embeddings
Figure 2 for Multimodal Depression Classification Using Articulatory Coordination Features And Hierarchical Attention Based Text Embeddings
Figure 3 for Multimodal Depression Classification Using Articulatory Coordination Features And Hierarchical Attention Based Text Embeddings
Figure 4 for Multimodal Depression Classification Using Articulatory Coordination Features And Hierarchical Attention Based Text Embeddings
Viaarxiv icon

Filler Word Detection and Classification: A Dataset and Benchmark

Mar 28, 2022
Ge Zhu, Juan-Pablo Caceres, Justin Salamon

Figure 1 for Filler Word Detection and Classification: A Dataset and Benchmark
Figure 2 for Filler Word Detection and Classification: A Dataset and Benchmark
Figure 3 for Filler Word Detection and Classification: A Dataset and Benchmark
Figure 4 for Filler Word Detection and Classification: A Dataset and Benchmark
Viaarxiv icon

Improved Prosodic Clustering for Multispeaker and Speaker-independent Phoneme-level Prosody Control

Nov 19, 2021
Myrsini Christidou, Alexandra Vioni, Nikolaos Ellinas, Georgios Vamvoukakis, Konstantinos Markopoulos, Panos Kakoulidis, June Sig Sung, Hyoungmin Park, Aimilios Chalamandaris, Pirros Tsiakoulis

Figure 1 for Improved Prosodic Clustering for Multispeaker and Speaker-independent Phoneme-level Prosody Control
Figure 2 for Improved Prosodic Clustering for Multispeaker and Speaker-independent Phoneme-level Prosody Control
Figure 3 for Improved Prosodic Clustering for Multispeaker and Speaker-independent Phoneme-level Prosody Control
Figure 4 for Improved Prosodic Clustering for Multispeaker and Speaker-independent Phoneme-level Prosody Control
Viaarxiv icon

Self-training and Pre-training are Complementary for Speech Recognition

Oct 22, 2020
Qiantong Xu, Alexei Baevski, Tatiana Likhomanenko, Paden Tomasello, Alexis Conneau, Ronan Collobert, Gabriel Synnaeve, Michael Auli

Figure 1 for Self-training and Pre-training are Complementary for Speech Recognition
Figure 2 for Self-training and Pre-training are Complementary for Speech Recognition
Figure 3 for Self-training and Pre-training are Complementary for Speech Recognition
Figure 4 for Self-training and Pre-training are Complementary for Speech Recognition
Viaarxiv icon