Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Leveraging Unpaired Text Data for Training End-to-End Speech-to-Intent Systems

Oct 08, 2020

Yinghui Huang, Hong-Kwang Kuo, Samuel Thomas, Zvi Kons, Kartik Audhkhasi, Brian Kingsbury, Ron Hoory, Michael Picheny

Figure 1 for Leveraging Unpaired Text Data for Training End-to-End Speech-to-Intent Systems

Figure 2 for Leveraging Unpaired Text Data for Training End-to-End Speech-to-Intent Systems

Figure 3 for Leveraging Unpaired Text Data for Training End-to-End Speech-to-Intent Systems

Figure 4 for Leveraging Unpaired Text Data for Training End-to-End Speech-to-Intent Systems

Share this with someone who'll enjoy it:

Abstract:Training an end-to-end (E2E) neural network speech-to-intent (S2I) system that directly extracts intents from speech requires large amounts of intent-labeled speech data, which is time consuming and expensive to collect. Initializing the S2I model with an ASR model trained on copious speech data can alleviate data sparsity. In this paper, we attempt to leverage NLU text resources. We implemented a CTC-based S2I system that matches the performance of a state-of-the-art, traditional cascaded SLU system. We performed controlled experiments with varying amounts of speech and text training data. When only a tenth of the original data is available, intent classification accuracy degrades by 7.6% absolute. Assuming we have additional text-to-intent data (without speech) available, we investigated two techniques to improve the S2I system: (1) transfer learning, in which acoustic embeddings for intent classification are tied to fine-tuned BERT text embeddings; and (2) data augmentation, in which the text-to-intent data is converted into speech-to-intent data using a multi-speaker text-to-speech system. The proposed approaches recover 80% of performance lost due to using limited intent-labeled speech.

* 5 pages, published in ICASSP 2020

View paper on

Share this with someone who'll enjoy it:

Title:Leveraging Unpaired Text Data for Training End-to-End Speech-to-Intent Systems

Paper and Code