Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Similarity and Content-based Phonetic Self Attention for Speech Recognition

Mar 28, 2022

Kyuhong Shim, Wonyong Sung

Figure 1 for Similarity and Content-based Phonetic Self Attention for Speech Recognition

Figure 2 for Similarity and Content-based Phonetic Self Attention for Speech Recognition

Figure 3 for Similarity and Content-based Phonetic Self Attention for Speech Recognition

Figure 4 for Similarity and Content-based Phonetic Self Attention for Speech Recognition

Share this with someone who'll enjoy it:

Abstract:Transformer-based speech recognition models have achieved great success due to the self-attention (SA) mechanism that utilizes every frame in the feature extraction process. Especially, SA heads in lower layers capture various phonetic characteristics by the query-key dot product, which is designed to compute the pairwise relationship between frames. In this paper, we propose a variant of SA to extract more representative phonetic features. The proposed phonetic self-attention (phSA) is composed of two different types of phonetic attention; one is similarity-based and the other is content-based. In short, similarity-based attention utilizes the correlation between frames while content-based attention only considers each frame without being affected by others. We identify which parts of the original dot product are related to two different attention patterns and improve each part by simple modifications. Our experiments on phoneme classification and speech recognition show that replacing SA with phSA for lower layers improves the recognition performance without increasing the latency and the parameter size.

* Submitted to INTERSPEECH 2022

View paper on

Share this with someone who'll enjoy it:

Title:Similarity and Content-based Phonetic Self Attention for Speech Recognition

Paper and Code