Get our free extension to see links to code for papers anywhere online!

Chrome logo  Add to Chrome

Firefox logo Add to Firefox

Modular Hybrid Autoregressive Transducer


Oct 31, 2022
Zhong Meng, Tongzhou Chen, Rohit Prabhavalkar, Yu Zhang, Gary Wang, Kartik Audhkhasi, Jesse Emond, Trevor Strohman, Bhuvana Ramabhadran, W. Ronny Huang, Ehsan Variani, Yinghui Huang, Pedro J. Moreno

Add code

* 2022 IEEE Spoken Language Technology Workshop (SLT), Doha, Qatar 
* 8 pages, 1 figure, SLT 2022 

   Access Paper or Ask Questions

  • Share via Twitter
  • Share via Facebook
  • Share via LinkedIn
  • Share via Whatsapp
  • Share via Messenger
  • Share via Email

Analysis of Self-Attention Head Diversity for Conformer-based Automatic Speech Recognition


Sep 13, 2022
Kartik Audhkhasi, Yinghui Huang, Bhuvana Ramabhadran, Pedro J. Moreno

Add code

* Accepted for publication in Interspeech 2022 

   Access Paper or Ask Questions

  • Share via Twitter
  • Share via Facebook
  • Share via LinkedIn
  • Share via Whatsapp
  • Share via Messenger
  • Share via Email

Leveraging Unpaired Text Data for Training End-to-End Speech-to-Intent Systems


Oct 08, 2020
Yinghui Huang, Hong-Kwang Kuo, Samuel Thomas, Zvi Kons, Kartik Audhkhasi, Brian Kingsbury, Ron Hoory, Michael Picheny

Add code

* 5 pages, published in ICASSP 2020 

   Access Paper or Ask Questions

  • Share via Twitter
  • Share via Facebook
  • Share via LinkedIn
  • Share via Whatsapp
  • Share via Messenger
  • Share via Email

End-to-End Spoken Language Understanding Without Full Transcripts


Sep 30, 2020
Hong-Kwang J. Kuo, Zoltán Tüske, Samuel Thomas, Yinghui Huang, Kartik Audhkhasi, Brian Kingsbury, Gakuto Kurata, Zvi Kons, Ron Hoory, Luis Lastras

Add code

* 5 pages, to be published in Interspeech 2020 

   Access Paper or Ask Questions

  • Share via Twitter
  • Share via Facebook
  • Share via LinkedIn
  • Share via Whatsapp
  • Share via Messenger
  • Share via Email

AVLnet: Learning Audio-Visual Language Representations from Instructional Videos


Jun 16, 2020
Andrew Rouditchenko, Angie Boggust, David Harwath, Dhiraj Joshi, Samuel Thomas, Kartik Audhkhasi, Rogerio Feris, Brian Kingsbury, Michael Picheny, Antonio Torralba, James Glass

Add code


   Access Paper or Ask Questions

  • Share via Twitter
  • Share via Facebook
  • Share via LinkedIn
  • Share via Whatsapp
  • Share via Messenger
  • Share via Email

Single headed attention based sequence-to-sequence model for state-of-the-art results on Switchboard-300


Jan 20, 2020
Zoltán Tüske, George Saon, Kartik Audhkhasi, Brian Kingsbury

Add code

* 5 pages, 2 figures 

   Access Paper or Ask Questions

  • Share via Twitter
  • Share via Facebook
  • Share via LinkedIn
  • Share via Whatsapp
  • Share via Messenger
  • Share via Email

Challenging the Boundaries of Speech Recognition: The MALACH Corpus


Aug 09, 2019
Michael Picheny, Zóltan Tüske, Brian Kingsbury, Kartik Audhkhasi, Xiaodong Cui, George Saon

Add code

* Accepted for publication at INTERSPEECH 2019 

   Access Paper or Ask Questions

  • Share via Twitter
  • Share via Facebook
  • Share via LinkedIn
  • Share via Whatsapp
  • Share via Messenger
  • Share via Email

Guiding CTC Posterior Spike Timings for Improved Posterior Fusion and Knowledge Distillation


Apr 17, 2019
Gakuto Kurata, Kartik Audhkhasi

Add code

* Submitted to Interspeech 2019 

   Access Paper or Ask Questions

  • Share via Twitter
  • Share via Facebook
  • Share via LinkedIn
  • Share via Whatsapp
  • Share via Messenger
  • Share via Email

Acoustically Grounded Word Embeddings for Improved Acoustics-to-Word Speech Recognition


Mar 29, 2019
Shane Settle, Kartik Audhkhasi, Karen Livescu, Michael Picheny

Add code

* To appear at ICASSP 2019 

   Access Paper or Ask Questions

  • Share via Twitter
  • Share via Facebook
  • Share via LinkedIn
  • Share via Whatsapp
  • Share via Messenger
  • Share via Email

Joint Modeling of Accents and Acoustics for Multi-Accent Speech Recognition


Feb 07, 2018
Xuesong Yang, Kartik Audhkhasi, Andrew Rosenberg, Samuel Thomas, Bhuvana Ramabhadran, Mark Hasegawa-Johnson

Add code

* Accepted in The 43rd IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP2018) 

   Access Paper or Ask Questions

  • Share via Twitter
  • Share via Facebook
  • Share via LinkedIn
  • Share via Whatsapp
  • Share via Messenger
  • Share via Email
1
2
>>