Alert button

"speech": models, code, and papers
Alert button

Make More of Your Data: Minimal Effort Data Augmentation for Automatic Speech Recognition and Translation

Add code
Bookmark button
Alert button
Oct 27, 2022
Tsz Kin Lam, Shigehiko Schamoni, Stefan Riezler

Figure 1 for Make More of Your Data: Minimal Effort Data Augmentation for Automatic Speech Recognition and Translation
Figure 2 for Make More of Your Data: Minimal Effort Data Augmentation for Automatic Speech Recognition and Translation
Figure 3 for Make More of Your Data: Minimal Effort Data Augmentation for Automatic Speech Recognition and Translation
Figure 4 for Make More of Your Data: Minimal Effort Data Augmentation for Automatic Speech Recognition and Translation
Viaarxiv icon

Audio-visual video face hallucination with frequency supervision and cross modality support by speech based lip reading loss

Nov 20, 2022
Shailza Sharma, Abhinav Dhall, Vinay Kumar, Vivek Singh Bawa

Figure 1 for Audio-visual video face hallucination with frequency supervision and cross modality support by speech based lip reading loss
Figure 2 for Audio-visual video face hallucination with frequency supervision and cross modality support by speech based lip reading loss
Figure 3 for Audio-visual video face hallucination with frequency supervision and cross modality support by speech based lip reading loss
Figure 4 for Audio-visual video face hallucination with frequency supervision and cross modality support by speech based lip reading loss
Viaarxiv icon

A Comparative Study on multichannel Speaker-attributed automatic speech recognition in Multi-party Meetings

Nov 01, 2022
Mohan Shi, Jie Zhang, Zhihao Du, Fan Yu, Shiliang Zhang, Li-Rong Dai

Figure 1 for A Comparative Study on multichannel Speaker-attributed automatic speech recognition in Multi-party Meetings
Figure 2 for A Comparative Study on multichannel Speaker-attributed automatic speech recognition in Multi-party Meetings
Figure 3 for A Comparative Study on multichannel Speaker-attributed automatic speech recognition in Multi-party Meetings
Figure 4 for A Comparative Study on multichannel Speaker-attributed automatic speech recognition in Multi-party Meetings
Viaarxiv icon

Tiny-Sepformer: A Tiny Time-Domain Transformer Network for Speech Separation

Jun 30, 2022
Jian Luo, Jianzong Wang, Ning Cheng, Edward Xiao, Xulong Zhang, Jing Xiao

Figure 1 for Tiny-Sepformer: A Tiny Time-Domain Transformer Network for Speech Separation
Figure 2 for Tiny-Sepformer: A Tiny Time-Domain Transformer Network for Speech Separation
Figure 3 for Tiny-Sepformer: A Tiny Time-Domain Transformer Network for Speech Separation
Figure 4 for Tiny-Sepformer: A Tiny Time-Domain Transformer Network for Speech Separation
Viaarxiv icon

An Investigation on Applying Acoustic Feature Conversion to ASR of Adult and Child Speech

May 25, 2022
Wei Liu, Jingyu Li, Tan Lee

Figure 1 for An Investigation on Applying Acoustic Feature Conversion to ASR of Adult and Child Speech
Figure 2 for An Investigation on Applying Acoustic Feature Conversion to ASR of Adult and Child Speech
Figure 3 for An Investigation on Applying Acoustic Feature Conversion to ASR of Adult and Child Speech
Figure 4 for An Investigation on Applying Acoustic Feature Conversion to ASR of Adult and Child Speech
Viaarxiv icon

Low-Resource End-to-end Sanskrit TTS using Tacotron2, WaveGlow and Transfer Learning

Dec 07, 2022
Ankur Debnath, Shridevi S Patil, Gangotri Nadiger, Ramakrishnan Angarai Ganesan

Figure 1 for Low-Resource End-to-end Sanskrit TTS using Tacotron2, WaveGlow and Transfer Learning
Figure 2 for Low-Resource End-to-end Sanskrit TTS using Tacotron2, WaveGlow and Transfer Learning
Figure 3 for Low-Resource End-to-end Sanskrit TTS using Tacotron2, WaveGlow and Transfer Learning
Figure 4 for Low-Resource End-to-end Sanskrit TTS using Tacotron2, WaveGlow and Transfer Learning
Viaarxiv icon

Efficiency 360: Efficient Vision Transformers

Add code
Bookmark button
Alert button
Feb 23, 2023
Badri N. Patro, Vijay Srinivas Agneeswaran

Figure 1 for Efficiency 360: Efficient Vision Transformers
Figure 2 for Efficiency 360: Efficient Vision Transformers
Figure 3 for Efficiency 360: Efficient Vision Transformers
Figure 4 for Efficiency 360: Efficient Vision Transformers
Viaarxiv icon

EmoGator: A New Open Source Vocal Burst Dataset with Baseline Machine Learning Classification Methodologies

Add code
Bookmark button
Alert button
Jan 02, 2023
Fred W. Buhl

Figure 1 for EmoGator: A New Open Source Vocal Burst Dataset with Baseline Machine Learning Classification Methodologies
Figure 2 for EmoGator: A New Open Source Vocal Burst Dataset with Baseline Machine Learning Classification Methodologies
Figure 3 for EmoGator: A New Open Source Vocal Burst Dataset with Baseline Machine Learning Classification Methodologies
Figure 4 for EmoGator: A New Open Source Vocal Burst Dataset with Baseline Machine Learning Classification Methodologies
Viaarxiv icon

TDASS: Target Domain Adaptation Speech Synthesis Framework for Multi-speaker Low-Resource TTS

May 24, 2022
Xulong Zhang, Jianzong Wang, Ning Cheng, Jing Xiao

Figure 1 for TDASS: Target Domain Adaptation Speech Synthesis Framework for Multi-speaker Low-Resource TTS
Figure 2 for TDASS: Target Domain Adaptation Speech Synthesis Framework for Multi-speaker Low-Resource TTS
Figure 3 for TDASS: Target Domain Adaptation Speech Synthesis Framework for Multi-speaker Low-Resource TTS
Viaarxiv icon

Knowledge-Based Counterfactual Queries for Visual Question Answering

Mar 05, 2023
Theodoti Stoikou, Maria Lymperaiou, Giorgos Stamou

Figure 1 for Knowledge-Based Counterfactual Queries for Visual Question Answering
Figure 2 for Knowledge-Based Counterfactual Queries for Visual Question Answering
Figure 3 for Knowledge-Based Counterfactual Queries for Visual Question Answering
Figure 4 for Knowledge-Based Counterfactual Queries for Visual Question Answering
Viaarxiv icon