Alert button

"speech recognition": models, code, and papers
Alert button

Anytime, Anywhere: Human Arm Pose from Smartwatch Data for Ubiquitous Robot Control and Teleoperation

Jun 22, 2023
Fabian C Weigend, Shubham Sonawani, Michael Drolet, Heni Ben Amor

Figure 1 for Anytime, Anywhere: Human Arm Pose from Smartwatch Data for Ubiquitous Robot Control and Teleoperation
Figure 2 for Anytime, Anywhere: Human Arm Pose from Smartwatch Data for Ubiquitous Robot Control and Teleoperation
Figure 3 for Anytime, Anywhere: Human Arm Pose from Smartwatch Data for Ubiquitous Robot Control and Teleoperation
Figure 4 for Anytime, Anywhere: Human Arm Pose from Smartwatch Data for Ubiquitous Robot Control and Teleoperation
Viaarxiv icon

Towards Model-Size Agnostic, Compute-Free, Memorization-based Inference of Deep Learning

Jul 14, 2023
Davide Giacomini, Maeesha Binte Hashem, Jeremiah Suarez, Swarup Bhunia, Amit Ranjan Trivedi

Figure 1 for Towards Model-Size Agnostic, Compute-Free, Memorization-based Inference of Deep Learning
Figure 2 for Towards Model-Size Agnostic, Compute-Free, Memorization-based Inference of Deep Learning
Figure 3 for Towards Model-Size Agnostic, Compute-Free, Memorization-based Inference of Deep Learning
Figure 4 for Towards Model-Size Agnostic, Compute-Free, Memorization-based Inference of Deep Learning
Viaarxiv icon

Time-Domain Speech Enhancement for Robust Automatic Speech Recognition

Oct 27, 2022
Yufeng Yang, Ashutosh Pandey, DeLiang Wang

Figure 1 for Time-Domain Speech Enhancement for Robust Automatic Speech Recognition
Figure 2 for Time-Domain Speech Enhancement for Robust Automatic Speech Recognition
Figure 3 for Time-Domain Speech Enhancement for Robust Automatic Speech Recognition
Figure 4 for Time-Domain Speech Enhancement for Robust Automatic Speech Recognition
Viaarxiv icon

Inter-KD: Intermediate Knowledge Distillation for CTC-Based Automatic Speech Recognition

Nov 28, 2022
Ji Won Yoon, Beom Jun Woo, Sunghwan Ahn, Hyeonseung Lee, Nam Soo Kim

Figure 1 for Inter-KD: Intermediate Knowledge Distillation for CTC-Based Automatic Speech Recognition
Figure 2 for Inter-KD: Intermediate Knowledge Distillation for CTC-Based Automatic Speech Recognition
Figure 3 for Inter-KD: Intermediate Knowledge Distillation for CTC-Based Automatic Speech Recognition
Figure 4 for Inter-KD: Intermediate Knowledge Distillation for CTC-Based Automatic Speech Recognition
Viaarxiv icon

RAND: Robustness Aware Norm Decay For Quantized Seq2seq Models

May 24, 2023
David Qiu, David Rim, Shaojin Ding, Oleg Rybakov, Yanzhang He

Figure 1 for RAND: Robustness Aware Norm Decay For Quantized Seq2seq Models
Figure 2 for RAND: Robustness Aware Norm Decay For Quantized Seq2seq Models
Figure 3 for RAND: Robustness Aware Norm Decay For Quantized Seq2seq Models
Figure 4 for RAND: Robustness Aware Norm Decay For Quantized Seq2seq Models
Viaarxiv icon

Reducing the gap between streaming and non-streaming Transducer-based ASR by adaptive two-stage knowledge distillation

Jun 27, 2023
Haitao Tang, Yu Fu, Lei Sun, Jiabin Xue, Dan Liu, Yongchao Li, Zhiqiang Ma, Minghui Wu, Jia Pan, Genshun Wan, Ming'en Zhao

Figure 1 for Reducing the gap between streaming and non-streaming Transducer-based ASR by adaptive two-stage knowledge distillation
Figure 2 for Reducing the gap between streaming and non-streaming Transducer-based ASR by adaptive two-stage knowledge distillation
Figure 3 for Reducing the gap between streaming and non-streaming Transducer-based ASR by adaptive two-stage knowledge distillation
Figure 4 for Reducing the gap between streaming and non-streaming Transducer-based ASR by adaptive two-stage knowledge distillation
Viaarxiv icon

Visual Context-driven Audio Feature Enhancement for Robust End-to-End Audio-Visual Speech Recognition

Jul 13, 2022
Joanna Hong, Minsu Kim, Daehun Yoo, Yong Man Ro

Figure 1 for Visual Context-driven Audio Feature Enhancement for Robust End-to-End Audio-Visual Speech Recognition
Figure 2 for Visual Context-driven Audio Feature Enhancement for Robust End-to-End Audio-Visual Speech Recognition
Figure 3 for Visual Context-driven Audio Feature Enhancement for Robust End-to-End Audio-Visual Speech Recognition
Figure 4 for Visual Context-driven Audio Feature Enhancement for Robust End-to-End Audio-Visual Speech Recognition
Viaarxiv icon

Boosting Local Spectro-Temporal Features for Speech Analysis

May 17, 2023
Michael Guerzhoy

Viaarxiv icon

Evaluating OpenAI's Whisper ASR for Punctuation Prediction and Topic Modeling of life histories of the Museum of the Person

Add code
Bookmark button
Alert button
May 23, 2023
Lucas Rafael Stefanel Gris, Ricardo Marcacini, Arnaldo Candido Junior, Edresson Casanova, Anderson Soares, Sandra Maria Aluísio

Figure 1 for Evaluating OpenAI's Whisper ASR for Punctuation Prediction and Topic Modeling of life histories of the Museum of the Person
Figure 2 for Evaluating OpenAI's Whisper ASR for Punctuation Prediction and Topic Modeling of life histories of the Museum of the Person
Viaarxiv icon

AVFormer: Injecting Vision into Frozen Speech Models for Zero-Shot AV-ASR

Mar 29, 2023
Paul Hongsuck Seo, Arsha Nagrani, Cordelia Schmid

Figure 1 for AVFormer: Injecting Vision into Frozen Speech Models for Zero-Shot AV-ASR
Figure 2 for AVFormer: Injecting Vision into Frozen Speech Models for Zero-Shot AV-ASR
Figure 3 for AVFormer: Injecting Vision into Frozen Speech Models for Zero-Shot AV-ASR
Figure 4 for AVFormer: Injecting Vision into Frozen Speech Models for Zero-Shot AV-ASR
Viaarxiv icon