Alert button

"speech": models, code, and papers
Alert button

T5lephone: Bridging Speech and Text Self-supervised Models for Spoken Language Understanding via Phoneme level T5

Add code
Bookmark button
Alert button
Nov 01, 2022
Chan-Jan Hsu, Ho-Lam Chung, Hung-yi Lee, Yu Tsao

Figure 1 for T5lephone: Bridging Speech and Text Self-supervised Models for Spoken Language Understanding via Phoneme level T5
Figure 2 for T5lephone: Bridging Speech and Text Self-supervised Models for Spoken Language Understanding via Phoneme level T5
Figure 3 for T5lephone: Bridging Speech and Text Self-supervised Models for Spoken Language Understanding via Phoneme level T5
Figure 4 for T5lephone: Bridging Speech and Text Self-supervised Models for Spoken Language Understanding via Phoneme level T5
Viaarxiv icon

An Automatic Speech Recognition System for Bengali Language based on Wav2Vec2 and Transfer Learning

Add code
Bookmark button
Alert button
Sep 20, 2022
Tushar Talukder Showrav

Figure 1 for An Automatic Speech Recognition System for Bengali Language based on Wav2Vec2 and Transfer Learning
Figure 2 for An Automatic Speech Recognition System for Bengali Language based on Wav2Vec2 and Transfer Learning
Figure 3 for An Automatic Speech Recognition System for Bengali Language based on Wav2Vec2 and Transfer Learning
Viaarxiv icon

End-to-End Integration of Speech Recognition, Dereverberation, Beamforming, and Self-Supervised Learning Representation

Add code
Bookmark button
Alert button
Oct 19, 2022
Yoshiki Masuyama, Xuankai Chang, Samuele Cornell, Shinji Watanabe, Nobutaka Ono

Figure 1 for End-to-End Integration of Speech Recognition, Dereverberation, Beamforming, and Self-Supervised Learning Representation
Figure 2 for End-to-End Integration of Speech Recognition, Dereverberation, Beamforming, and Self-Supervised Learning Representation
Figure 3 for End-to-End Integration of Speech Recognition, Dereverberation, Beamforming, and Self-Supervised Learning Representation
Figure 4 for End-to-End Integration of Speech Recognition, Dereverberation, Beamforming, and Self-Supervised Learning Representation
Viaarxiv icon

Align, Write, Re-order: Explainable End-to-End Speech Translation via Operation Sequence Generation

Nov 11, 2022
Motoi Omachi, Brian Yan, Siddharth Dalmia, Yuya Fujita, Shinji Watanabe

Figure 1 for Align, Write, Re-order: Explainable End-to-End Speech Translation via Operation Sequence Generation
Figure 2 for Align, Write, Re-order: Explainable End-to-End Speech Translation via Operation Sequence Generation
Figure 3 for Align, Write, Re-order: Explainable End-to-End Speech Translation via Operation Sequence Generation
Figure 4 for Align, Write, Re-order: Explainable End-to-End Speech Translation via Operation Sequence Generation
Viaarxiv icon

Unified Speech-Text Pre-training for Speech Translation and Recognition

Add code
Bookmark button
Alert button
Apr 11, 2022
Yun Tang, Hongyu Gong, Ning Dong, Changhan Wang, Wei-Ning Hsu, Jiatao Gu, Alexei Baevski, Xian Li, Abdelrahman Mohamed, Michael Auli, Juan Pino

Figure 1 for Unified Speech-Text Pre-training for Speech Translation and Recognition
Figure 2 for Unified Speech-Text Pre-training for Speech Translation and Recognition
Figure 3 for Unified Speech-Text Pre-training for Speech Translation and Recognition
Figure 4 for Unified Speech-Text Pre-training for Speech Translation and Recognition
Viaarxiv icon

Can Knowledge of End-to-End Text-to-Speech Models Improve Neural MIDI-to-Audio Synthesis Systems?

Add code
Bookmark button
Alert button
Nov 25, 2022
Xuan Shi, Erica Cooper, Xin Wang, Junichi Yamagishi, Shrikanth Narayanan

Figure 1 for Can Knowledge of End-to-End Text-to-Speech Models Improve Neural MIDI-to-Audio Synthesis Systems?
Figure 2 for Can Knowledge of End-to-End Text-to-Speech Models Improve Neural MIDI-to-Audio Synthesis Systems?
Figure 3 for Can Knowledge of End-to-End Text-to-Speech Models Improve Neural MIDI-to-Audio Synthesis Systems?
Figure 4 for Can Knowledge of End-to-End Text-to-Speech Models Improve Neural MIDI-to-Audio Synthesis Systems?
Viaarxiv icon

Combating high variance in Data-Scarce Implicit Hate Speech Classification

Aug 29, 2022
Debaditya Pal, Kaustubh Chaudhari, Harsh Sharma

Figure 1 for Combating high variance in Data-Scarce Implicit Hate Speech Classification
Figure 2 for Combating high variance in Data-Scarce Implicit Hate Speech Classification
Figure 3 for Combating high variance in Data-Scarce Implicit Hate Speech Classification
Viaarxiv icon

Can Voice Assistants Sound Cute? Towards a Model of Kawaii Vocalics

Apr 22, 2023
Katie Seaborn, Somang Nam, Julia Keckeis, Tatsuya Itagaki

Figure 1 for Can Voice Assistants Sound Cute? Towards a Model of Kawaii Vocalics
Viaarxiv icon

"I'm" Lost in Translation: Pronoun Missteps in Crowdsourced Data Sets

Add code
Bookmark button
Alert button
Apr 22, 2023
Katie Seaborn, Yeongdae Kim

Figure 1 for "I'm" Lost in Translation: Pronoun Missteps in Crowdsourced Data Sets
Figure 2 for "I'm" Lost in Translation: Pronoun Missteps in Crowdsourced Data Sets
Figure 3 for "I'm" Lost in Translation: Pronoun Missteps in Crowdsourced Data Sets
Viaarxiv icon

Streaming Audio-Visual Speech Recognition with Alignment Regularization

Nov 03, 2022
Pingchuan Ma, Niko Moritz, Stavros Petridis, Christian Fuegen, Maja Pantic

Figure 1 for Streaming Audio-Visual Speech Recognition with Alignment Regularization
Figure 2 for Streaming Audio-Visual Speech Recognition with Alignment Regularization
Figure 3 for Streaming Audio-Visual Speech Recognition with Alignment Regularization
Figure 4 for Streaming Audio-Visual Speech Recognition with Alignment Regularization
Viaarxiv icon