Picture for Ching-Feng Yeh

Ching-Feng Yeh

Attention or Convolution: Transformer Encoders in Audio Language Models for Inference Efficiency

Add code
Nov 05, 2023
Viaarxiv icon

FLAP: Fast Language-Audio Pre-training

Add code
Nov 02, 2023
Viaarxiv icon

Efficient Speech Representation Learning with Low-Bit Quantization

Add code
Dec 14, 2022
Figure 1 for Efficient Speech Representation Learning with Low-Bit Quantization
Figure 2 for Efficient Speech Representation Learning with Low-Bit Quantization
Figure 3 for Efficient Speech Representation Learning with Low-Bit Quantization
Viaarxiv icon

Continual Learning for On-Device Speech Recognition using Disentangled Conformers

Add code
Dec 13, 2022
Figure 1 for Continual Learning for On-Device Speech Recognition using Disentangled Conformers
Figure 2 for Continual Learning for On-Device Speech Recognition using Disentangled Conformers
Figure 3 for Continual Learning for On-Device Speech Recognition using Disentangled Conformers
Figure 4 for Continual Learning for On-Device Speech Recognition using Disentangled Conformers
Viaarxiv icon

SUPERB @ SLT 2022: Challenge on Generalization and Efficiency of Self-Supervised Speech Representation Learning

Add code
Oct 16, 2022
Figure 1 for SUPERB @ SLT 2022: Challenge on Generalization and Efficiency of Self-Supervised Speech Representation Learning
Figure 2 for SUPERB @ SLT 2022: Challenge on Generalization and Efficiency of Self-Supervised Speech Representation Learning
Figure 3 for SUPERB @ SLT 2022: Challenge on Generalization and Efficiency of Self-Supervised Speech Representation Learning
Figure 4 for SUPERB @ SLT 2022: Challenge on Generalization and Efficiency of Self-Supervised Speech Representation Learning
Viaarxiv icon

TorchAudio: Building Blocks for Audio and Speech Processing

Add code
Oct 28, 2021
Figure 1 for TorchAudio: Building Blocks for Audio and Speech Processing
Figure 2 for TorchAudio: Building Blocks for Audio and Speech Processing
Figure 3 for TorchAudio: Building Blocks for Audio and Speech Processing
Figure 4 for TorchAudio: Building Blocks for Audio and Speech Processing
Viaarxiv icon

Dynamic Encoder Transducer: A Flexible Solution For Trading Off Accuracy For Latency

Add code
Apr 05, 2021
Figure 1 for Dynamic Encoder Transducer: A Flexible Solution For Trading Off Accuracy For Latency
Figure 2 for Dynamic Encoder Transducer: A Flexible Solution For Trading Off Accuracy For Latency
Figure 3 for Dynamic Encoder Transducer: A Flexible Solution For Trading Off Accuracy For Latency
Figure 4 for Dynamic Encoder Transducer: A Flexible Solution For Trading Off Accuracy For Latency
Viaarxiv icon

Semantic Distance: A New Metric for ASR Performance Analysis Towards Spoken Language Understanding

Add code
Apr 05, 2021
Figure 1 for Semantic Distance: A New Metric for ASR Performance Analysis Towards Spoken Language Understanding
Figure 2 for Semantic Distance: A New Metric for ASR Performance Analysis Towards Spoken Language Understanding
Figure 3 for Semantic Distance: A New Metric for ASR Performance Analysis Towards Spoken Language Understanding
Figure 4 for Semantic Distance: A New Metric for ASR Performance Analysis Towards Spoken Language Understanding
Viaarxiv icon

Alignment Restricted Streaming Recurrent Neural Network Transducer

Add code
Nov 05, 2020
Figure 1 for Alignment Restricted Streaming Recurrent Neural Network Transducer
Figure 2 for Alignment Restricted Streaming Recurrent Neural Network Transducer
Figure 3 for Alignment Restricted Streaming Recurrent Neural Network Transducer
Figure 4 for Alignment Restricted Streaming Recurrent Neural Network Transducer
Viaarxiv icon

Streaming Attention-Based Models with Augmented Memory for End-to-End Speech Recognition

Add code
Nov 03, 2020
Figure 1 for Streaming Attention-Based Models with Augmented Memory for End-to-End Speech Recognition
Figure 2 for Streaming Attention-Based Models with Augmented Memory for End-to-End Speech Recognition
Figure 3 for Streaming Attention-Based Models with Augmented Memory for End-to-End Speech Recognition
Figure 4 for Streaming Attention-Based Models with Augmented Memory for End-to-End Speech Recognition
Viaarxiv icon