Alert button

"speech": models, code, and papers
Alert button

Cocktail HuBERT: Generalized Self-Supervised Pre-training for Mixture and Single-Source Speech

Mar 20, 2023
Maryam Fazel-Zarandi, Wei-Ning Hsu

Figure 1 for Cocktail HuBERT: Generalized Self-Supervised Pre-training for Mixture and Single-Source Speech
Figure 2 for Cocktail HuBERT: Generalized Self-Supervised Pre-training for Mixture and Single-Source Speech
Figure 3 for Cocktail HuBERT: Generalized Self-Supervised Pre-training for Mixture and Single-Source Speech
Figure 4 for Cocktail HuBERT: Generalized Self-Supervised Pre-training for Mixture and Single-Source Speech
Viaarxiv icon

Towards Model-Size Agnostic, Compute-Free, Memorization-based Inference of Deep Learning

Jul 14, 2023
Davide Giacomini, Maeesha Binte Hashem, Jeremiah Suarez, Swarup Bhunia, Amit Ranjan Trivedi

Figure 1 for Towards Model-Size Agnostic, Compute-Free, Memorization-based Inference of Deep Learning
Figure 2 for Towards Model-Size Agnostic, Compute-Free, Memorization-based Inference of Deep Learning
Figure 3 for Towards Model-Size Agnostic, Compute-Free, Memorization-based Inference of Deep Learning
Figure 4 for Towards Model-Size Agnostic, Compute-Free, Memorization-based Inference of Deep Learning
Viaarxiv icon

Personalized speech enhancement combining band-split RNN and speaker attentive module

Feb 20, 2023
Xiaohuai Le, Zhongshu Hou, Li Chen, Chao He, Yiqing Guo, Cheng Chen, Xianjun Xia, Jing Lu

Figure 1 for Personalized speech enhancement combining band-split RNN and speaker attentive module
Viaarxiv icon

Frequency bin-wise single channel speech presence probability estimation using multiple DNNs

Feb 23, 2023
Shuai Tao, Himavanth Reddy, Jesper Rindom Jensen, Mads Græsbøll Christensen

Figure 1 for Frequency bin-wise single channel speech presence probability estimation using multiple DNNs
Figure 2 for Frequency bin-wise single channel speech presence probability estimation using multiple DNNs
Figure 3 for Frequency bin-wise single channel speech presence probability estimation using multiple DNNs
Figure 4 for Frequency bin-wise single channel speech presence probability estimation using multiple DNNs
Viaarxiv icon

BLASER: A Text-Free Speech-to-Speech Translation Evaluation Metric

Dec 16, 2022
Mingda Chen, Paul-Ambroise Duquenne, Pierre Andrews, Justine Kao, Alexandre Mourachko, Holger Schwenk, Marta R. Costa-jussà

Figure 1 for BLASER: A Text-Free Speech-to-Speech Translation Evaluation Metric
Figure 2 for BLASER: A Text-Free Speech-to-Speech Translation Evaluation Metric
Figure 3 for BLASER: A Text-Free Speech-to-Speech Translation Evaluation Metric
Figure 4 for BLASER: A Text-Free Speech-to-Speech Translation Evaluation Metric
Viaarxiv icon

BuboGPT: Enabling Visual Grounding in Multi-Modal LLMs

Jul 17, 2023
Yang Zhao, Zhijie Lin, Daquan Zhou, Zilong Huang, Jiashi Feng, Bingyi Kang

Figure 1 for BuboGPT: Enabling Visual Grounding in Multi-Modal LLMs
Figure 2 for BuboGPT: Enabling Visual Grounding in Multi-Modal LLMs
Figure 3 for BuboGPT: Enabling Visual Grounding in Multi-Modal LLMs
Figure 4 for BuboGPT: Enabling Visual Grounding in Multi-Modal LLMs
Viaarxiv icon

Direct Speech-to-speech Translation without Textual Annotation using Bottleneck Features

Dec 12, 2022
Junhui Zhang, Junjie Pan, Xiang Yin, Zejun Ma

Figure 1 for Direct Speech-to-speech Translation without Textual Annotation using Bottleneck Features
Figure 2 for Direct Speech-to-speech Translation without Textual Annotation using Bottleneck Features
Figure 3 for Direct Speech-to-speech Translation without Textual Annotation using Bottleneck Features
Figure 4 for Direct Speech-to-speech Translation without Textual Annotation using Bottleneck Features
Viaarxiv icon

LipLearner: Customizable Silent Speech Interactions on Mobile Devices

Feb 14, 2023
Zixiong Su, Shitao Fang, Jun Rekimoto

Figure 1 for LipLearner: Customizable Silent Speech Interactions on Mobile Devices
Figure 2 for LipLearner: Customizable Silent Speech Interactions on Mobile Devices
Figure 3 for LipLearner: Customizable Silent Speech Interactions on Mobile Devices
Figure 4 for LipLearner: Customizable Silent Speech Interactions on Mobile Devices
Viaarxiv icon

Improving Isochronous Machine Translation with Target Factors and Auxiliary Counters

May 22, 2023
Proyag Pal, Brian Thompson, Yogesh Virkar, Prashant Mathur, Alexandra Chronopoulou, Marcello Federico

Figure 1 for Improving Isochronous Machine Translation with Target Factors and Auxiliary Counters
Figure 2 for Improving Isochronous Machine Translation with Target Factors and Auxiliary Counters
Figure 3 for Improving Isochronous Machine Translation with Target Factors and Auxiliary Counters
Figure 4 for Improving Isochronous Machine Translation with Target Factors and Auxiliary Counters
Viaarxiv icon

DiffMotion: Speech-Driven Gesture Synthesis Using Denoising Diffusion Model

Jan 24, 2023
Fan Zhang, Naye Ji, Fuxing Gao, Yongping Li

Figure 1 for DiffMotion: Speech-Driven Gesture Synthesis Using Denoising Diffusion Model
Figure 2 for DiffMotion: Speech-Driven Gesture Synthesis Using Denoising Diffusion Model
Figure 3 for DiffMotion: Speech-Driven Gesture Synthesis Using Denoising Diffusion Model
Figure 4 for DiffMotion: Speech-Driven Gesture Synthesis Using Denoising Diffusion Model
Viaarxiv icon