Get our free extension to see links to code for papers anywhere online!

Chrome logo  Add to Chrome

Firefox logo Add to Firefox

Scene-aware Far-field Automatic Speech Recognition

Apr 21, 2021
Zhenyu Tang, Dinesh Manocha

Share this with someone who'll enjoy it:

We propose a novel method for generating scene-aware training data for far-field automatic speech recognition. We use a deep learning-based estimator to non-intrusively compute the sub-band reverberation time of an environment from its speech samples. We model the acoustic characteristics of a scene with its reverberation time and represent it using a multivariate Gaussian distribution. We use this distribution to select acoustic impulse responses from a large real-world dataset for augmenting speech data. The speech recognition system trained on our scene-aware data consistently outperforms the system trained using many more random acoustic impulse responses on the REVERB and the AMI far-field benchmarks. In practice, we obtain 2.64% absolute improvement in word error rate compared with using training data of the same size with uniformly distributed reverberation times.

   Access Paper Source

Share this with someone who'll enjoy it: