Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Harishchandra Dubey

ICASSP 2023 Deep Speech Enhancement Challenge

Mar 21, 2023

Harishchandra Dubey, Ashkan Aazami, Vishak Gopal, Babak Naderi, Sebastian Braun, Ross Cutler, Alex Ju, Mehdi Zohourian, Min Tang, Hannes Gamper(+2 more)

Figure 1 for ICASSP 2023 Deep Speech Enhancement Challenge

Abstract:Deep Speech Enhancement Challenge is the 5th edition of deep noise suppression (DNS) challenges organized at ICASSP 2023 Signal Processing Grand Challenges. DNS challenges were organized during 2019-2023 to stimulate research in deep speech enhancement (DSE). Previous DNS challenges were organized at INTERSPEECH 2020, ICASSP 2021, INTERSPEECH 2021, and ICASSP 2022. From prior editions, we learnt that improving signal quality (SIG) is challenging particularly in presence of simultaneously active interfering talkers and noise. This challenge aims to develop models for joint denosing, dereverberation and suppression of interfering talkers. When primary talker wears a headphone, certain acoustic properties of their speech such as direct-to-reverberation (DRR), signal to noise ratio (SNR) etc. make it possible to suppress neighboring talkers even without enrollment data for primary talker. This motivated us to create two tracks for this challenge: (i) Track-1 Headset; (ii) Track-2 Speakerphone. Both tracks has fullband (48kHz) training data and testset, and each testclips has a corresponding enrollment data (10-30s duration) for primary talker. Each track invited submissions of personalized and non-personalized models all of which are evaluated through same subjective evaluation. Most models submitted to challenge were personalized models, same team is winner in both tracks where the best models has improvement of 0.145 and 0.141 in challenge's Score as compared to noisy blind testset.

* 6 pages, 1 figure. arXiv admin note: text overlap with arXiv:2202.13288

Via

Access Paper or Ask Questions

ICASSP 2022 Deep Noise Suppression Challenge

Feb 27, 2022

Harishchandra Dubey, Vishak Gopal, Ross Cutler, Ashkan Aazami, Sergiy Matusevych, Sebastian Braun, Sefik Emre Eskimez, Manthan Thakker, Takuya Yoshioka, Hannes Gamper(+1 more)

Figure 1 for ICASSP 2022 Deep Noise Suppression Challenge

Abstract:The Deep Noise Suppression (DNS) challenge is designed to foster innovation in the area of noise suppression to achieve superior perceptual speech quality. This is the 4th DNS challenge, with the previous editions held at INTERSPEECH 2020, ICASSP 2021, and INTERSPEECH 2021. We open-source datasets and test sets for researchers to train their deep noise suppression models, as well as a subjective evaluation framework based on ITU-T P.835 to rate and rank-order the challenge entries. We provide access to DNSMOS P.835 and word accuracy (WAcc) APIs to challenge participants to help with iterative model improvements. In this challenge, we introduced the following changes: (i) Included mobile device scenarios in the blind test set; (ii) Included a personalized noise suppression track with baseline; (iii) Added WAcc as an objective metric; (iv) Included DNSMOS P.835; (v) Made the training datasets and test sets fullband (48 kHz). We use an average of WAcc and subjective scores P.835 SIG, BAK, and OVRL to get the final score for ranking the DNS models. We believe that as a research community, we still have a long way to go in achieving excellent speech quality in challenging noisy real-world scenarios.

Via

Access Paper or Ask Questions

MusicNet: Compact Convolutional Neural Network for Real-time Background Music Detection

Oct 08, 2021

Chandan K. A. Reddy, Vishak Gopa, Harishchandra Dubey, Sergiy Matusevych, Ross Cutler, Robert Aichner

Figure 1 for MusicNet: Compact Convolutional Neural Network for Real-time Background Music Detection

Figure 2 for MusicNet: Compact Convolutional Neural Network for Real-time Background Music Detection

Abstract:With the recent growth of remote and hybrid work, online meetings often encounter challenging audio contexts such as background noise, music, and echo. Accurate real-time detection of music events can help to improve the user experience in such scenarios, e.g., by switching to high-fidelity music-specific codec or selecting the optimal noise suppression model. In this paper, we present MusicNet -- a compact high-performance model for detecting background music in the real-time communications pipeline. In online video meetings, which is our main use case, music almost always co-occurs with speech and background noises, making the accurate classification quite challenging. The proposed model is a binary classifier that consists of a compact convolutional neural network core preceded by an in-model featurization layer. It takes 9 seconds of raw audio as input and does not require any model-specific featurization on the client. We train our model on a balanced subset of the AudioSet data and use 1000 crowd-sourced real test clips to validate the model. Finally, we compare MusicNet performance to 20 other state-of-the-art models. Our classifier gives a true positive rate of 81.3% at a 0.1% false positive rate, which is significantly better than any other model in the study. Our model is also 10x smaller and has 4x faster inference than the comparable baseline.

Via

Access Paper or Ask Questions

Interspeech 2021 Deep Noise Suppression Challenge

Jan 10, 2021

Chandan K A Reddy, Harishchandra Dubey, Kazuhito Koishida, Arun Nair, Vishak Gopal, Ross Cutler, Sebastian Braun, Hannes Gamper, Robert Aichner, Sriram Srinivasan

Figure 1 for Interspeech 2021 Deep Noise Suppression Challenge

Abstract:The Deep Noise Suppression (DNS) challenge is designed to foster innovation in the area of noise suppression to achieve superior perceptual speech quality. We recently organized a DNS challenge special session at INTERSPEECH and ICASSP 2020. We open-sourced training and test datasets for the wideband scenario. We also open-sourced a subjective evaluation framework based on ITU-T standard P.808, which was also used to evaluate participants of the challenge. Many researchers from academia and industry made significant contributions to push the field forward, yet even the best noise suppressor was far from achieving superior speech quality in challenging scenarios. In this version of the challenge organized at INTERSPEECH 2021, we are expanding both our training and test datasets to accommodate full band scenarios. The two tracks in this challenge will focus on real-time denoising for (i) wide band, and(ii) full band scenarios. We are also making available a reliable non-intrusive objective speech quality metric called DNSMOS for the participants to use during their development phase.

* arXiv admin note: substantial text overlap with arXiv:2009.06122

Via

Access Paper or Ask Questions

The INTERSPEECH 2020 Deep Noise Suppression Challenge: Datasets, Subjective Testing Framework, and Challenge Results

May 29, 2020

Chandan K. A. Reddy, Vishak Gopal, Ross Cutler, Ebrahim Beyrami, Roger Cheng, Harishchandra Dubey, Sergiy Matusevych, Robert Aichner, Ashkan Aazami, Sebastian Braun(+3 more)

Figure 1 for The INTERSPEECH 2020 Deep Noise Suppression Challenge: Datasets, Subjective Testing Framework, and Challenge Results

Figure 2 for The INTERSPEECH 2020 Deep Noise Suppression Challenge: Datasets, Subjective Testing Framework, and Challenge Results

Figure 3 for The INTERSPEECH 2020 Deep Noise Suppression Challenge: Datasets, Subjective Testing Framework, and Challenge Results

Figure 4 for The INTERSPEECH 2020 Deep Noise Suppression Challenge: Datasets, Subjective Testing Framework, and Challenge Results

Abstract:The INTERSPEECH 2020 Deep Noise Suppression (DNS) Challenge is intended to promote collaborative research in real-time single-channel Speech Enhancement aimed to maximize the subjective (perceptual) quality of the enhanced speech. A typical approach to evaluate the noise suppression methods is to use objective metrics on the test set obtained by splitting the original dataset. While the performance is good on the synthetic test set, often the model performance degrades significantly on real recordings. Also, most of the conventional objective metrics do not correlate well with subjective tests and lab subjective tests are not scalable for a large test set. In this challenge, we open-sourced a large clean speech and noise corpus for training the noise suppression models and a representative test set to real-world scenarios consisting of both synthetic and real recordings. We also open-sourced an online subjective test framework based on ITU-T P.808 for researchers to reliably test their developments. We evaluated the results using P.808 on a blind test set. The results and the key learnings from the challenge are discussed. The datasets and scripts can be found here for quick access https://github.com/microsoft/DNS-Challenge.

* Interspeech 2020. arXiv admin note: substantial text overlap with arXiv:2001.08662

Via

Access Paper or Ask Questions

The INTERSPEECH 2020 Deep Noise Suppression Challenge: Datasets, Subjective Speech Quality and Testing Framework

Jan 23, 2020

Chandan K. A. Reddy, Ebrahim Beyrami, Harishchandra Dubey, Vishak Gopal, Roger Cheng, Ross Cutler, Sergiy Matusevych, Robert Aichner, Ashkan Aazami, Sebastian Braun(+3 more)

Figure 1 for The INTERSPEECH 2020 Deep Noise Suppression Challenge: Datasets, Subjective Speech Quality and Testing Framework

Figure 2 for The INTERSPEECH 2020 Deep Noise Suppression Challenge: Datasets, Subjective Speech Quality and Testing Framework

Abstract:The INTERSPEECH 2020 Deep Noise Suppression Challenge is intended to promote collaborative research in real-time single-channel Speech Enhancement aimed to maximize the subjective (perceptual) quality of the enhanced speech. A typical approach to evaluate the noise suppression methods is to use objective metrics on the test set obtained by splitting the original dataset. Many publications report reasonable performance on the synthetic test set drawn from the same distribution as that of the training set. However, often the model performance degrades significantly on real recordings. Also, most of the conventional objective metrics do not correlate well with subjective tests and lab subjective tests are not scalable for a large test set. In this challenge, we open-source a large clean speech and noise corpus for training the noise suppression models and a representative test set to real-world scenarios consisting of both synthetic and real recordings. We also open source an online subjective test framework based on ITU-T P.808 for researchers to quickly test their developments. The winners of this challenge will be selected based on subjective evaluation on a representative test set using P.808 framework.

* Details about Deep Noise Suppression Challenge

Via

Access Paper or Ask Questions

CURE Dataset: Ladder Networks for Audio Event Classification

Jan 12, 2020

Harishchandra Dubey, Dimitra Emmanouilidou, Ivan J. Tashev

Figure 1 for CURE Dataset: Ladder Networks for Audio Event Classification

Figure 2 for CURE Dataset: Ladder Networks for Audio Event Classification

Figure 3 for CURE Dataset: Ladder Networks for Audio Event Classification

Abstract:Audio event classification is an important task for several applications such as surveillance, audio, video and multimedia retrieval etc. There are approximately 3M people with hearing loss who can't perceive events happening around them. This paper establishes the CURE dataset which contains curated set of specific audio events most relevant for people with hearing loss. We propose a ladder network based audio event classifier that utilizes 5s sound recordings derived from the Freesound project. We adopted the state-of-the-art convolutional neural network (CNN) embeddings as audio features for this task. We also investigate extreme learning machine (ELM) for event classification. In this study, proposed classifiers are compared with support vector machine (SVM) baseline. We propose signal and feature normalization that aims to reduce the mismatch between different recordings scenarios. Firstly, CNN is trained on weakly labeled Audioset data. Next, the pre-trained model is adopted as feature extractor for proposed CURE corpus. We incorporate ESC-50 dataset as second evaluation set. Results and discussions validate the superiority of Ladder network over ELM and SVM classifier in terms of robustness and increased classification accuracy. While Ladder network is robust to data mismatches, simpler SVM and ELM classifiers are sensitive to such mismatches, where the proposed normalization techniques can play an important role. Experimental studies with ESC-50 and CURE corpora elucidate the differences in dataset complexity and robustness offered by proposed approaches.

* 6 pages, 2 Figures

Via

Access Paper or Ask Questions

Fuzzy C-Means Clustering and Sonification of HRV Features

Sep 30, 2019

Debanjan Borthakur, Victoria Grace, Paul Batchelor, Harishchandra Dubey, Kunal Mankodiya

Figure 1 for Fuzzy C-Means Clustering and Sonification of HRV Features

Figure 2 for Fuzzy C-Means Clustering and Sonification of HRV Features

Figure 3 for Fuzzy C-Means Clustering and Sonification of HRV Features

Figure 4 for Fuzzy C-Means Clustering and Sonification of HRV Features

Abstract:Linear and non-linear measures of heart rate variability (HRV) are widely investigated as non-invasive indicators of health. Stress has a profound impact on heart rate, and different meditation techniques have been found to modulate heartbeat rhythm. This paper aims to explore the process of identifying appropriate metrices from HRV analysis for sonification. Sonification is a type of auditory display involving the process of mapping data to acoustic parameters. This work explores the use of auditory display in aiding the analysis of HRV leveraged by unsupervised machine learning techniques. Unsupervised clustering helps select the appropriate features to improve the sonification interpretability. Vocal synthesis sonification techniques are employed to increase comprehension and learnability of the processed data displayed through sound. These analyses are early steps in building a real-time sound-based biofeedback training system.

* 2019 the IEEE/ACM 4th International Conference on Connected Health: Applications, Systems and Engineering Technologies: EdgeDL WorkshopAt: Washington, D.C, sep- 25-27
* 5 pages, 5 figures

Via

Access Paper or Ask Questions

Toeplitz Inverse Covariance based Robust Speaker Clustering for Naturalistic Audio Streams

Jul 12, 2019

Harishchandra Dubey, Abhijeet Sangwan, John Hansen

Figure 1 for Toeplitz Inverse Covariance based Robust Speaker Clustering for Naturalistic Audio Streams

Figure 2 for Toeplitz Inverse Covariance based Robust Speaker Clustering for Naturalistic Audio Streams

Figure 3 for Toeplitz Inverse Covariance based Robust Speaker Clustering for Naturalistic Audio Streams

Abstract:Speaker diarization determines who spoke and when? in an audio stream. In this study, we propose a model-based approach for robust speaker clustering using i-vectors. The ivectors extracted from different segments of same speaker are correlated. We model this correlation with a Markov Random Field (MRF) network. Leveraging the advancements in MRF modeling, we used Toeplitz Inverse Covariance (TIC) matrix to represent the MRF correlation network for each speaker. This approaches captures the sequential structure of i-vectors (or equivalent speaker turns) belonging to same speaker in an audio stream. A variant of standard Expectation Maximization (EM) algorithm is adopted for deriving closed-form solution using dynamic programming (DP) and the alternating direction method of multiplier (ADMM). Our diarization system has four steps: (1) ground-truth segmentation; (2) i-vector extraction; (3) post-processing (mean subtraction, principal component analysis, and length-normalization) ; and (4) proposed speaker clustering. We employ cosine K-means and movMF speaker clustering as baseline approaches. Our evaluation data is derived from: (i) CRSS-PLTL corpus, and (ii) two meetings subset of the AMI corpus. Relative reduction in diarization error rate (DER) for CRSS-PLTL corpus is 43.22% using the proposed advancements as compared to baseline. For AMI meetings IS1000a and IS1003b, relative DER reduction is 29.37% and 9.21%, respectively.

* 6 Pages, 3 Fiigures, 5 Equations

Via

Access Paper or Ask Questions

Bacterial Foraging Optimized STATCOM for Stability Assessment in Power System

Oct 01, 2016

Shiba R. Paital, Prakash K. Ray, Asit Mohanty, Sandipan Patra, Harishchandra Dubey

Figure 1 for Bacterial Foraging Optimized STATCOM for Stability Assessment in Power System

Figure 2 for Bacterial Foraging Optimized STATCOM for Stability Assessment in Power System

Figure 3 for Bacterial Foraging Optimized STATCOM for Stability Assessment in Power System

Figure 4 for Bacterial Foraging Optimized STATCOM for Stability Assessment in Power System

Abstract:This paper presents a study of improvement in stability in a single machine connected to infinite bus (SMIB) power system by using static compensator (STATCOM). The gains of Proportional-Integral-Derivative (PID) controller in STATCOM are being optimized by heuristic technique based on Particle swarm optimization (PSO). Further, Bacterial Foraging Optimization (BFO) as an alternative heuristic method is also applied to select optimal gains of PID controller. The performance of STATCOM with the above soft-computing techniques are studied and compared with the conventional PID controller under various scenarios. The simulation results are accompanied with performance indices based quantitative analysis. The analysis clearly signifies the robustness of the new scheme in terms of stability and voltage regulation when compared with conventional PID.

* 5 pages, 7 figures, 2016 IEEE Students' Technology Symposium (TechSym 2016), At IIT Kharagpur, India

Via

Access Paper or Ask Questions