Alert button

"speech": models, code, and papers
Alert button

A Sidecar Separator Can Convert a Single-Talker Speech Recognition System to a Multi-Talker One

Add code
Bookmark button
Alert button
Mar 05, 2023
Lingwei Meng, Jiawen Kang, Mingyu Cui, Yuejiao Wang, Xixin Wu, Helen Meng

Figure 1 for A Sidecar Separator Can Convert a Single-Talker Speech Recognition System to a Multi-Talker One
Figure 2 for A Sidecar Separator Can Convert a Single-Talker Speech Recognition System to a Multi-Talker One
Figure 3 for A Sidecar Separator Can Convert a Single-Talker Speech Recognition System to a Multi-Talker One
Figure 4 for A Sidecar Separator Can Convert a Single-Talker Speech Recognition System to a Multi-Talker One
Viaarxiv icon

Towards Model-Size Agnostic, Compute-Free, Memorization-based Inference of Deep Learning

Jul 14, 2023
Davide Giacomini, Maeesha Binte Hashem, Jeremiah Suarez, Swarup Bhunia, Amit Ranjan Trivedi

Figure 1 for Towards Model-Size Agnostic, Compute-Free, Memorization-based Inference of Deep Learning
Figure 2 for Towards Model-Size Agnostic, Compute-Free, Memorization-based Inference of Deep Learning
Figure 3 for Towards Model-Size Agnostic, Compute-Free, Memorization-based Inference of Deep Learning
Figure 4 for Towards Model-Size Agnostic, Compute-Free, Memorization-based Inference of Deep Learning
Viaarxiv icon

Beyond Universal Transformer: block reusing with adaptor in Transformer for automatic speech recognit

Mar 23, 2023
Haoyu Tang, Zhaoyi Liu, Chang Zeng, Xinfeng Li

Figure 1 for Beyond Universal Transformer: block reusing with adaptor in Transformer for automatic speech recognit
Figure 2 for Beyond Universal Transformer: block reusing with adaptor in Transformer for automatic speech recognit
Figure 3 for Beyond Universal Transformer: block reusing with adaptor in Transformer for automatic speech recognit
Figure 4 for Beyond Universal Transformer: block reusing with adaptor in Transformer for automatic speech recognit
Viaarxiv icon

Using Deepfake Technologies for Word Emphasis Detection

May 12, 2023
Eran Kaufman, Lee-Ad Gottlieb

Figure 1 for Using Deepfake Technologies for Word Emphasis Detection
Figure 2 for Using Deepfake Technologies for Word Emphasis Detection
Figure 3 for Using Deepfake Technologies for Word Emphasis Detection
Figure 4 for Using Deepfake Technologies for Word Emphasis Detection
Viaarxiv icon

DAE-Talker: High Fidelity Speech-Driven Talking Face Generation with Diffusion Autoencoder

Add code
Bookmark button
Alert button
Mar 30, 2023
Chenpng Du, Qi Chen, Tianyu He, Xu Tan, Xie Chen, Kai Yu, Sheng Zhao, Jiang Bian

Figure 1 for DAE-Talker: High Fidelity Speech-Driven Talking Face Generation with Diffusion Autoencoder
Figure 2 for DAE-Talker: High Fidelity Speech-Driven Talking Face Generation with Diffusion Autoencoder
Figure 3 for DAE-Talker: High Fidelity Speech-Driven Talking Face Generation with Diffusion Autoencoder
Figure 4 for DAE-Talker: High Fidelity Speech-Driven Talking Face Generation with Diffusion Autoencoder
Viaarxiv icon

SpeeChain: A Speech Toolkit for Large-Scale Machine Speech Chain

Jan 08, 2023
Heli Qi, Sashi Novitasari, Andros Tjandra, Sakriani Sakti, Satoshi Nakamura

Figure 1 for SpeeChain: A Speech Toolkit for Large-Scale Machine Speech Chain
Figure 2 for SpeeChain: A Speech Toolkit for Large-Scale Machine Speech Chain
Figure 3 for SpeeChain: A Speech Toolkit for Large-Scale Machine Speech Chain
Figure 4 for SpeeChain: A Speech Toolkit for Large-Scale Machine Speech Chain
Viaarxiv icon

Frame-wise and overlap-robust speaker embeddings for meeting diarization

Jun 01, 2023
Tobias Cord-Landwehr, Christoph Boeddeker, Cătălin Zorilă, Rama Doddipatla, Reinhold Haeb-Umbach

Figure 1 for Frame-wise and overlap-robust speaker embeddings for meeting diarization
Figure 2 for Frame-wise and overlap-robust speaker embeddings for meeting diarization
Figure 3 for Frame-wise and overlap-robust speaker embeddings for meeting diarization
Figure 4 for Frame-wise and overlap-robust speaker embeddings for meeting diarization
Viaarxiv icon

BuboGPT: Enabling Visual Grounding in Multi-Modal LLMs

Add code
Bookmark button
Alert button
Jul 17, 2023
Yang Zhao, Zhijie Lin, Daquan Zhou, Zilong Huang, Jiashi Feng, Bingyi Kang

Figure 1 for BuboGPT: Enabling Visual Grounding in Multi-Modal LLMs
Figure 2 for BuboGPT: Enabling Visual Grounding in Multi-Modal LLMs
Figure 3 for BuboGPT: Enabling Visual Grounding in Multi-Modal LLMs
Figure 4 for BuboGPT: Enabling Visual Grounding in Multi-Modal LLMs
Viaarxiv icon

LyricWhiz: Robust Multilingual Zero-shot Lyrics Transcription by Whispering to ChatGPT

Add code
Bookmark button
Alert button
Jul 07, 2023
Le Zhuo, Ruibin Yuan, Jiahao Pan, Yinghao Ma, Yizhi LI, Ge Zhang, Si Liu, Roger Dannenberg, Jie Fu, Chenghua Lin, Emmanouil Benetos, Wenhu Chen, Wei Xue, Yike Guo

Figure 1 for LyricWhiz: Robust Multilingual Zero-shot Lyrics Transcription by Whispering to ChatGPT
Figure 2 for LyricWhiz: Robust Multilingual Zero-shot Lyrics Transcription by Whispering to ChatGPT
Figure 3 for LyricWhiz: Robust Multilingual Zero-shot Lyrics Transcription by Whispering to ChatGPT
Figure 4 for LyricWhiz: Robust Multilingual Zero-shot Lyrics Transcription by Whispering to ChatGPT
Viaarxiv icon

Multilingual Contextual Adapters To Improve Custom Word Recognition In Low-resource Languages

Jul 03, 2023
Devang Kulshreshtha, Saket Dingliwal, Brady Houston, Sravan Bodapati

Figure 1 for Multilingual Contextual Adapters To Improve Custom Word Recognition In Low-resource Languages
Figure 2 for Multilingual Contextual Adapters To Improve Custom Word Recognition In Low-resource Languages
Figure 3 for Multilingual Contextual Adapters To Improve Custom Word Recognition In Low-resource Languages
Viaarxiv icon