Alert button

"speech": models, code, and papers
Alert button

Taylor, Can You Hear Me Now? A Taylor-Unfolding Framework for Monaural Speech Enhancement

Add code
Bookmark button
Alert button
Apr 30, 2022
Andong Li, Shan You, Guochen Yu, Chengshi Zheng, Xiaodong Li

Figure 1 for Taylor, Can You Hear Me Now? A Taylor-Unfolding Framework for Monaural Speech Enhancement
Figure 2 for Taylor, Can You Hear Me Now? A Taylor-Unfolding Framework for Monaural Speech Enhancement
Figure 3 for Taylor, Can You Hear Me Now? A Taylor-Unfolding Framework for Monaural Speech Enhancement
Figure 4 for Taylor, Can You Hear Me Now? A Taylor-Unfolding Framework for Monaural Speech Enhancement
Viaarxiv icon

Egocentric Audio-Visual Noise Suppression

Nov 07, 2022
Roshan Sharma, Weipeng He, Ju Lin, Egor Lakomkin, Yang Liu, Kaustubh Kalgaonkar

Figure 1 for Egocentric Audio-Visual Noise Suppression
Figure 2 for Egocentric Audio-Visual Noise Suppression
Figure 3 for Egocentric Audio-Visual Noise Suppression
Figure 4 for Egocentric Audio-Visual Noise Suppression
Viaarxiv icon

Generalized Product-of-Experts for Learning Multimodal Representations in Noisy Environments

Nov 07, 2022
Abhinav Joshi, Naman Gupta, Jinang Shah, Binod Bhattarai, Ashutosh Modi, Danail Stoyanov

Figure 1 for Generalized Product-of-Experts for Learning Multimodal Representations in Noisy Environments
Figure 2 for Generalized Product-of-Experts for Learning Multimodal Representations in Noisy Environments
Figure 3 for Generalized Product-of-Experts for Learning Multimodal Representations in Noisy Environments
Figure 4 for Generalized Product-of-Experts for Learning Multimodal Representations in Noisy Environments
Viaarxiv icon

Facetron: Multi-speaker Face-to-Speech Model based on Cross-modal Latent Representations

Add code
Bookmark button
Alert button
Jul 26, 2021
Se-Yun Um, Jihyun Kim, Jihyun Lee, Sangshin Oh, Kyungguen Byun, Hong-Goo Kang

Figure 1 for Facetron: Multi-speaker Face-to-Speech Model based on Cross-modal Latent Representations
Figure 2 for Facetron: Multi-speaker Face-to-Speech Model based on Cross-modal Latent Representations
Figure 3 for Facetron: Multi-speaker Face-to-Speech Model based on Cross-modal Latent Representations
Figure 4 for Facetron: Multi-speaker Face-to-Speech Model based on Cross-modal Latent Representations
Viaarxiv icon

K-Wav2vec 2.0: Automatic Speech Recognition based on Joint Decoding of Graphemes and Syllables

Add code
Bookmark button
Alert button
Oct 11, 2021
Jounghee Kim, Pilsung Kang

Figure 1 for K-Wav2vec 2.0: Automatic Speech Recognition based on Joint Decoding of Graphemes and Syllables
Figure 2 for K-Wav2vec 2.0: Automatic Speech Recognition based on Joint Decoding of Graphemes and Syllables
Figure 3 for K-Wav2vec 2.0: Automatic Speech Recognition based on Joint Decoding of Graphemes and Syllables
Figure 4 for K-Wav2vec 2.0: Automatic Speech Recognition based on Joint Decoding of Graphemes and Syllables
Viaarxiv icon

Complex-valued Spatial Autoencoders for Multichannel Speech Enhancement

Add code
Bookmark button
Alert button
Aug 06, 2021
Mhd Modar Halimeh, Walter Kellermann

Figure 1 for Complex-valued Spatial Autoencoders for Multichannel Speech Enhancement
Figure 2 for Complex-valued Spatial Autoencoders for Multichannel Speech Enhancement
Figure 3 for Complex-valued Spatial Autoencoders for Multichannel Speech Enhancement
Viaarxiv icon

TransPOS: Transformers for Consolidating Different POS Tagset Datasets

Add code
Bookmark button
Alert button
Sep 24, 2022
Alex Li, Ilyas Bankole-Hameed, Ranadeep Singh, Gabriel Shen Han Ng, Akshat Gupta

Figure 1 for TransPOS: Transformers for Consolidating Different POS Tagset Datasets
Figure 2 for TransPOS: Transformers for Consolidating Different POS Tagset Datasets
Figure 3 for TransPOS: Transformers for Consolidating Different POS Tagset Datasets
Figure 4 for TransPOS: Transformers for Consolidating Different POS Tagset Datasets
Viaarxiv icon

Token-level Sequence Labeling for Spoken Language Understanding using Compositional End-to-End Models

Add code
Bookmark button
Alert button
Oct 27, 2022
Siddhant Arora, Siddharth Dalmia, Brian Yan, Florian Metze, Alan W Black, Shinji Watanabe

Figure 1 for Token-level Sequence Labeling for Spoken Language Understanding using Compositional End-to-End Models
Figure 2 for Token-level Sequence Labeling for Spoken Language Understanding using Compositional End-to-End Models
Figure 3 for Token-level Sequence Labeling for Spoken Language Understanding using Compositional End-to-End Models
Figure 4 for Token-level Sequence Labeling for Spoken Language Understanding using Compositional End-to-End Models
Viaarxiv icon

Expressive Text-to-Speech using Style Tag

Add code
Bookmark button
Alert button
Apr 01, 2021
Minchan Kim, Sung Jun Cheon, Byoung Jin Choi, Jong Jin Kim, Nam Soo Kim

Figure 1 for Expressive Text-to-Speech using Style Tag
Figure 2 for Expressive Text-to-Speech using Style Tag
Figure 3 for Expressive Text-to-Speech using Style Tag
Figure 4 for Expressive Text-to-Speech using Style Tag
Viaarxiv icon

Speech SIMCLR: Combining Contrastive and Reconstruction Objective for Self-supervised Speech Representation Learning

Add code
Bookmark button
Alert button
Oct 27, 2020
Dongwei Jiang, Wubo Li, Miao Cao, Ruixiong Zhang, Wei Zou, Kun Han, Xiangang Li

Figure 1 for Speech SIMCLR: Combining Contrastive and Reconstruction Objective for Self-supervised Speech Representation Learning
Figure 2 for Speech SIMCLR: Combining Contrastive and Reconstruction Objective for Self-supervised Speech Representation Learning
Figure 3 for Speech SIMCLR: Combining Contrastive and Reconstruction Objective for Self-supervised Speech Representation Learning
Figure 4 for Speech SIMCLR: Combining Contrastive and Reconstruction Objective for Self-supervised Speech Representation Learning
Viaarxiv icon