Picture for Ron Hoory

Ron Hoory

Spoken question answering for visual queries

Add code
May 29, 2025
Viaarxiv icon

Granite-speech: open-source speech-aware LLMs with strong English ASR capabilities

Add code
May 14, 2025
Viaarxiv icon

Continuous Speech Synthesis using per-token Latent Diffusion

Add code
Oct 21, 2024
Figure 1 for Continuous Speech Synthesis using per-token Latent Diffusion
Figure 2 for Continuous Speech Synthesis using per-token Latent Diffusion
Figure 3 for Continuous Speech Synthesis using per-token Latent Diffusion
Figure 4 for Continuous Speech Synthesis using per-token Latent Diffusion
Viaarxiv icon

Creating an African American-Sounding TTS: Guidelines, Technical Challenges,and Surprising Evaluations

Add code
Mar 17, 2024
Figure 1 for Creating an African American-Sounding TTS: Guidelines, Technical Challenges,and Surprising Evaluations
Figure 2 for Creating an African American-Sounding TTS: Guidelines, Technical Challenges,and Surprising Evaluations
Figure 3 for Creating an African American-Sounding TTS: Guidelines, Technical Challenges,and Surprising Evaluations
Figure 4 for Creating an African American-Sounding TTS: Guidelines, Technical Challenges,and Surprising Evaluations
Viaarxiv icon

Speak While You Think: Streaming Speech Synthesis During Text Generation

Add code
Sep 20, 2023
Figure 1 for Speak While You Think: Streaming Speech Synthesis During Text Generation
Figure 2 for Speak While You Think: Streaming Speech Synthesis During Text Generation
Figure 3 for Speak While You Think: Streaming Speech Synthesis During Text Generation
Figure 4 for Speak While You Think: Streaming Speech Synthesis During Text Generation
Viaarxiv icon

Towards a Common Speech Analysis Engine

Add code
Mar 01, 2022
Figure 1 for Towards a Common Speech Analysis Engine
Figure 2 for Towards a Common Speech Analysis Engine
Figure 3 for Towards a Common Speech Analysis Engine
Figure 4 for Towards a Common Speech Analysis Engine
Viaarxiv icon

A new data augmentation method for intent classification enhancement and its application on spoken conversation datasets

Add code
Feb 21, 2022
Figure 1 for A new data augmentation method for intent classification enhancement and its application on spoken conversation datasets
Figure 2 for A new data augmentation method for intent classification enhancement and its application on spoken conversation datasets
Figure 3 for A new data augmentation method for intent classification enhancement and its application on spoken conversation datasets
Viaarxiv icon

Speech Emotion Recognition using Self-Supervised Features

Add code
Feb 07, 2022
Figure 1 for Speech Emotion Recognition using Self-Supervised Features
Figure 2 for Speech Emotion Recognition using Self-Supervised Features
Figure 3 for Speech Emotion Recognition using Self-Supervised Features
Figure 4 for Speech Emotion Recognition using Self-Supervised Features
Viaarxiv icon

Speaker Normalization for Self-supervised Speech Emotion Recognition

Add code
Feb 02, 2022
Figure 1 for Speaker Normalization for Self-supervised Speech Emotion Recognition
Figure 2 for Speaker Normalization for Self-supervised Speech Emotion Recognition
Figure 3 for Speaker Normalization for Self-supervised Speech Emotion Recognition
Viaarxiv icon

RNN Transducer Models For Spoken Language Understanding

Add code
Apr 08, 2021
Figure 1 for RNN Transducer Models For Spoken Language Understanding
Figure 2 for RNN Transducer Models For Spoken Language Understanding
Figure 3 for RNN Transducer Models For Spoken Language Understanding
Figure 4 for RNN Transducer Models For Spoken Language Understanding
Viaarxiv icon