Get our free extension to see links to code for papers anywhere online!

 Add to Chrome

 Add to Firefox

CatalyzeX Code Finder - Browser extension linking code for ML papers across the web! | Product Hunt Embed
Seeing is Knowing! Fact-based Visual Question Answering using Knowledge Graph Embeddings

Dec 31, 2020
Kiran Ramnath, Mark Hasegawa-Johnson

* 9 pages, 10 figures 

  Access Paper or Ask Questions

Multi-Decoder DPRNN: High Accuracy Source Counting and Separation

Nov 30, 2020
Junzhe Zhu, Raymond Yeh, Mark Hasegawa-Johnson

* Project Page: https://junzhejosephzhu.github.io/Multi-Decoder-DPRNN/ Submitted to ICASSP 2021 

  Access Paper or Ask Questions

Interpretable Visual Reasoning via Induced Symbolic Space

Nov 23, 2020
Zhonghao Wang, Mo Yu, Kai Wang, Jinjun Xiong, Wen-mei Hwu, Mark Hasegawa-Johnson, Humphrey Shi


  Access Paper or Ask Questions

Show and Speak: Directly Synthesize Spoken Description of Images

Oct 23, 2020
Xinsheng Wang, Siyuan Feng, Jihua Zhu, Mark Hasegawa-Johnson, Odette Scharenborg


  Access Paper or Ask Questions

How Phonotactics Affect Multilingual and Zero-shot ASR Performance

Oct 22, 2020
Siyuan Feng, Piotr Żelasko, Laureano Moro-Velázquez, Ali Abavisani, Mark Hasegawa-Johnson, Odette Scharenborg, Najim Dehak

* Submitted to ICASSP 2021. The first 2 authors contributed equally to this work 

  Access Paper or Ask Questions

Deep F-measure Maximization for End-to-End Speech Understanding

Aug 08, 2020
Leda Sarı, Mark Hasegawa-Johnson

* Interspeech 2020 submission (Accepted) 

  Access Paper or Ask Questions

Evaluating Automatically Generated Phoneme Captions for Images

Jul 31, 2020
Justin van der Hout, Zoltán D'Haese, Mark Hasegawa-Johnson, Odette Scharenborg

* Accepted at Interspeech2020 

  Access Paper or Ask Questions

Identify Speakers in Cocktail Parties with End-to-End Attention

May 22, 2020
Junzhe Zhu, Mark Hasegawa-Johnson, Leda Sari

* Submitted to Interspeech 2020; Github Link: https://github.com/JunzheJosephZhu/Identifying-Speakers-in-Cocktail-Parties-with-E2E-Attention 

  Access Paper or Ask Questions

That Sounds Familiar: an Analysis of Phonetic Representations Transfer Across Languages

May 16, 2020
Piotr Żelasko, Laureano Moro-Velázquez, Mark Hasegawa-Johnson, Odette Scharenborg, Najim Dehak

* Submitted to Interspeech 2020. For some reason, the ArXiv Latex engine rendered it in more than 4 pages 

  Access Paper or Ask Questions

Automatic Estimation of Inteligibility Measure for Consonants in Speech

May 12, 2020
Ali Abavisani, Mark Hasegawa-Johnson

* 5 pages, 1 figure, 7 tables, submitted to Inter Speech 2020 Conference 

  Access Paper or Ask Questions

Unsupervised Speech Decomposition via Triple Information Bottleneck

May 04, 2020
Kaizhi Qian, Yang Zhang, Shiyu Chang, David Cox, Mark Hasegawa-Johnson


  Access Paper or Ask Questions

F0-consistent many-to-many non-parallel voice conversion via conditional autoencoder

Apr 15, 2020
Kaizhi Qian, Zeyu Jin, Mark Hasegawa-Johnson, Gautham J. Mysore


  Access Paper or Ask Questions

Fast transcription of speech in low-resource languages

Sep 16, 2019
Mark Hasegawa-Johnson, Camille Goudeseune, Gina-Anne Levow

* 8 pages 

  Access Paper or Ask Questions

AUTOVC: Zero-Shot Voice Style Transfer with Only Autoencoder Loss

Jun 06, 2019
Kaizhi Qian, Yang Zhang, Shiyu Chang, Xuesong Yang, Mark Hasegawa-Johnson

* To Appear in Thirty-sixth International Conference on Machine Learning (ICML 2019) 

  Access Paper or Ask Questions

Zero-Shot Voice Style Transfer with Only Autoencoder Loss

May 14, 2019
Kaizhi Qian, Yang Zhang, Shiyu Chang, Xuesong Yang, Mark Hasegawa-Johnson

* * Equal contribution; To Appear in Thirty-sixth International Conference on Machine Learning (ICML 2019) 

  Access Paper or Ask Questions

When CTC Training Meets Acoustic Landmarks

Nov 05, 2018
Di He, Xuesong Yang, Boon Pang Lim, Yi Liang, Mark Hasegawa-Johnson, Deming Chen

* submitted in International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2019) 

  Access Paper or Ask Questions

Improved ASR for Under-Resourced Languages Through Multi-Task Learning with Acoustic Landmarks

May 15, 2018
Di He, Boon Pang Lim, Xuesong Yang, Mark Hasegawa-Johnson, Deming Chen

* Submitted in Interspeech2018 

  Access Paper or Ask Questions

Bayesian Models for Unit Discovery on a Very Low Resource Language

Feb 20, 2018
Lucas Ondel, Pierre Godard, Laurent Besacier, Elin Larsen, Mark Hasegawa-Johnson, Odette Scharenborg, Emmanuel Dupoux, Lukas Burget, François Yvon, Sanjeev Khudanpur

* Accepted to ICASSP 2018 

  Access Paper or Ask Questions

Deep Learning Based Speech Beamforming

Feb 15, 2018
Kaizhi Qian, Yang Zhang, Shiyu Chang, Xuesong Yang, Dinei Florencio, Mark Hasegawa-Johnson

* Accepted in The 43rd IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP2018) 

  Access Paper or Ask Questions

Linguistic unit discovery from multi-modal inputs in unwritten languages: Summary of the "Speaking Rosetta" JSALT 2017 Workshop

Feb 14, 2018
Odette Scharenborg, Laurent Besacier, Alan Black, Mark Hasegawa-Johnson, Florian Metze, Graham Neubig, Sebastian Stueker, Pierre Godard, Markus Mueller, Lucas Ondel, Shruti Palaskar, Philip Arthur, Francesco Ciannella, Mingxing Du, Elin Larsen, Danny Merkx, Rachid Riad, Liming Wang, Emmanuel Dupoux

* Accepted to ICASSP 2018 

  Access Paper or Ask Questions

Joint Modeling of Accents and Acoustics for Multi-Accent Speech Recognition

Feb 07, 2018
Xuesong Yang, Kartik Audhkhasi, Andrew Rosenberg, Samuel Thomas, Bhuvana Ramabhadran, Mark Hasegawa-Johnson

* Accepted in The 43rd IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP2018) 

  Access Paper or Ask Questions

Dilated Recurrent Neural Networks

Nov 02, 2017
Shiyu Chang, Yang Zhang, Wei Han, Mo Yu, Xiaoxiao Guo, Wei Tan, Xiaodong Cui, Michael Witbrock, Mark Hasegawa-Johnson, Thomas S. Huang

* Accepted by NIPS 2017 

  Access Paper or Ask Questions

Semantic Image Inpainting with Deep Generative Models

Jul 13, 2017
Raymond A. Yeh, Chen Chen, Teck Yian Lim, Alexander G. Schwing, Mark Hasegawa-Johnson, Minh N. Do


  Access Paper or Ask Questions

Performance Improvements of Probabilistic Transcript-adapted ASR with Recurrent Neural Network and Language-specific Constraints

Dec 13, 2016
Xiang Kong, Preethi Jyothi, Mark Hasegawa-Johnson


  Access Paper or Ask Questions

Landmark-based consonant voicing detection on multilingual corpora

Nov 10, 2016
Xiang Kong, Xuesong Yang, Mark Hasegawa-Johnson, Jeung-Yoon Choi, Stefanie Shattuck-Hufnagel

* ready to submit to JASA-EL 

  Access Paper or Ask Questions

Joint Optimization of Masks and Deep Recurrent Neural Networks for Monaural Source Separation

Oct 01, 2015
Po-Sen Huang, Minje Kim, Mark Hasegawa-Johnson, Paris Smaragdis

* IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol.23, no.12, pp.2136-2147, Dec. 2015 

  Access Paper or Ask Questions