Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Amir Ivry

MAPSS: Manifold-based Assessment of Perceptual Source Separation

Sep 11, 2025

Amir Ivry, Samuele Cornell, Shinji Watanabe

Abstract:Objective assessment of source-separation systems still mismatches subjective human perception, especially when leakage and self-distortion interact. We introduce the Perceptual Separation (PS) and Perceptual Match (PM), the first pair of measures that functionally isolate these two factors. Our intrusive method begins with generating a bank of fundamental distortions for each reference waveform signal in the mixture. Distortions, references, and their respective system outputs from all sources are then independently encoded by a pre-trained self-supervised learning model. These representations are aggregated and projected onto a manifold via diffusion maps, which aligns Euclidean distances on the manifold with dissimilarities of the encoded waveforms. On this manifold, the PM measures the Mahalanobis distance from each output to its attributed cluster that consists of its reference and distortions embeddings, capturing self-distortion. The PS accounts for the Mahalanobis distance of the output to the attributed and to the closest non-attributed clusters, quantifying leakage. Both measures are differentiable and granular, operating at a resolution as low as 50 frames per second. We further derive, for both measures, deterministic error radius and non-asymptotic, high-probability confidence intervals (CIs). Experiments on English, Spanish, and music mixtures show that the PS and PM nearly always achieve the highest linear correlation coefficients with human mean-opinion scores than 14 competitors, reaching as high as 86.36% for speech and 87.21% for music. We observe, at worst, an error radius of 1.39% and a probabilistic 95% CI of 12.21% for these coefficients, which improves reliable and informed evaluation. Using mutual information, the measures complement each other most as their values decrease, suggesting they are jointly more informative as system performance degrades.

* Submitted to ICLR

Via

Access Paper or Ask Questions

Summary of the NOTSOFAR-1 Challenge: Highlights and Learnings

Jan 28, 2025

Igor Abramovski, Alon Vinnikov, Shalev Shaer, Naoyuki Kanda, Xiaofei Wang, Amir Ivry, Eyal Krupka

Figure 1 for Summary of the NOTSOFAR-1 Challenge: Highlights and Learnings

Figure 2 for Summary of the NOTSOFAR-1 Challenge: Highlights and Learnings

Figure 3 for Summary of the NOTSOFAR-1 Challenge: Highlights and Learnings

Figure 4 for Summary of the NOTSOFAR-1 Challenge: Highlights and Learnings

Abstract:The first Natural Office Talkers in Settings of Far-field Audio Recordings (NOTSOFAR-1) Challenge is a pivotal initiative that sets new benchmarks by offering datasets more representative of the needs of real-world business applications than those previously available. The challenge provides a unique combination of 280 recorded meetings across 30 diverse environments, capturing real-world acoustic conditions and conversational dynamics, and a 1000-hour simulated training dataset, synthesized with enhanced authenticity for real-world generalization, incorporating 15,000 real acoustic transfer functions. In this paper, we provide an overview of the systems submitted to the challenge and analyze the top-performing approaches, hypothesizing the factors behind their success. Additionally, we highlight promising directions left unexplored by participants. By presenting key findings and actionable insights, this work aims to drive further innovation and progress in DASR research and applications.

Via

Access Paper or Ask Questions

NOTSOFAR-1 Challenge: New Datasets, Baseline, and Tasks for Distant Meeting Transcription

Jan 16, 2024

Alon Vinnikov, Amir Ivry, Aviv Hurvitz, Igor Abramovski, Sharon Koubi, Ilya Gurvich, Shai Pe`er, Xiong Xiao, Benjamin Martinez Elizalde, Naoyuki Kanda(+9 more)

Figure 1 for NOTSOFAR-1 Challenge: New Datasets, Baseline, and Tasks for Distant Meeting Transcription

Figure 2 for NOTSOFAR-1 Challenge: New Datasets, Baseline, and Tasks for Distant Meeting Transcription

Abstract:We introduce the first Natural Office Talkers in Settings of Far-field Audio Recordings (``NOTSOFAR-1'') Challenge alongside datasets and baseline system. The challenge focuses on distant speaker diarization and automatic speech recognition (DASR) in far-field meeting scenarios, with single-channel and known-geometry multi-channel tracks, and serves as a launch platform for two new datasets: First, a benchmarking dataset of 315 meetings, averaging 6 minutes each, capturing a broad spectrum of real-world acoustic conditions and conversational dynamics. It is recorded across 30 conference rooms, featuring 4-8 attendees and a total of 35 unique speakers. Second, a 1000-hour simulated training dataset, synthesized with enhanced authenticity for real-world generalization, incorporating 15,000 real acoustic transfer functions. The tasks focus on single-device DASR, where multi-channel devices always share the same known geometry. This is aligned with common setups in actual conference rooms, and avoids technical complexities associated with multi-device tasks. It also allows for the development of geometry-specific solutions. The NOTSOFAR-1 Challenge aims to advance research in the field of distant conversational speech recognition, providing key resources to unlock the potential of data-driven methods, which we believe are currently constrained by the absence of comprehensive high-quality training and benchmarking datasets.

* preprint

Via

Access Paper or Ask Questions

Deep Learning Interviews: Hundreds of fully solved job interview questions from a wide range of key topics in AI

Jan 04, 2022

Shlomo Kashani, Amir Ivry

Abstract:The second edition of Deep Learning Interviews is home to hundreds of fully-solved problems, from a wide range of key topics in AI. It is designed to both rehearse interview or exam specific topics and provide machine learning MSc / PhD. students, and those awaiting an interview a well-organized overview of the field. The problems it poses are tough enough to cut your teeth on and to dramatically improve your skills-but they're framed within thought-provoking questions and engaging stories. That is what makes the volume so specifically valuable to students and job seekers: it provides them with the ability to speak confidently and quickly on any relevant topic, to answer technical questions clearly and correctly, and to fully understand the purpose and meaning of interview questions and answers. Those are powerful, indispensable advantages to have when walking into the interview room. The book's contents is a large inventory of numerous topics relevant to DL job interviews and graduate level exams. That places this work at the forefront of the growing trend in science to teach a core set of practical mathematical and computational skills. It is widely accepted that the training of every computer scientist must include the fundamental theorems of ML, and AI appears in the curriculum of nearly every university. This volume is designed as an excellent reference for graduates of such programs.

Via

Access Paper or Ask Questions

Objective Metrics to Evaluate Residual-Echo Suppression During Double-Talk

Jul 15, 2021

Amir Ivry, Israel Cohen, Baruch Berdugo

Figure 1 for Objective Metrics to Evaluate Residual-Echo Suppression During Double-Talk

Figure 2 for Objective Metrics to Evaluate Residual-Echo Suppression During Double-Talk

Figure 3 for Objective Metrics to Evaluate Residual-Echo Suppression During Double-Talk

Figure 4 for Objective Metrics to Evaluate Residual-Echo Suppression During Double-Talk

Abstract:Human subjective evaluation is optimal to assess speech quality for human perception. The recently introduced deep noise suppression mean opinion score (DNSMOS) metric was shown to estimate human ratings with great accuracy. The signal-to-distortion ratio (SDR) metric is widely used to evaluate residual-echo suppression (RES) systems by estimating speech quality during double-talk. However, since the SDR is affected by both speech distortion and residual-echo presence, it does not correlate well with human ratings according to the DNSMOS. To address that, we introduce two objective metrics to separately quantify the desired-speech maintained level (DSML) and residual-echo suppression level (RESL) during double-talk. These metrics are evaluated using a deep learning-based RES-system with a tunable design parameter. Using 280 hours of real and simulated recordings, we show that the DSML and RESL correlate well with the DNSMOS with high generalization to various setups. Also, we empirically investigate the relation between tuning the RES-system design parameter and the DSML-RESL tradeoff it creates and offer a practical design scheme for dynamic system requirements.

* Accepted to WASPAA

Via

Access Paper or Ask Questions

Multiclass Permanent Magnets Superstructure for Indoor Localization using Artificial Intelligence

Jul 14, 2021

Amir Ivry, Elad Fisher, Roger Alimi, Idan Mosseri, Kanna Nahir

Figure 1 for Multiclass Permanent Magnets Superstructure for Indoor Localization using Artificial Intelligence

Figure 2 for Multiclass Permanent Magnets Superstructure for Indoor Localization using Artificial Intelligence

Figure 3 for Multiclass Permanent Magnets Superstructure for Indoor Localization using Artificial Intelligence

Figure 4 for Multiclass Permanent Magnets Superstructure for Indoor Localization using Artificial Intelligence

Abstract:Smartphones have become a popular tool for indoor localization and position estimation of users. Existing solutions mainly employ Wi-Fi, RFID, and magnetic sensing techniques to track movements in crowded venues. These are highly sensitive to magnetic clutters and depend on local ambient magnetic fields, which frequently degrades their performance. Also, these techniques often require pre-known mapping surveys of the area, or the presence of active beacons, which are not always available. We embed small-volume and large-moment magnets in pre-known locations and arrange them in specific geometric constellations that create magnetic superstructure patterns of supervised magnetic signatures. These signatures constitute an unambiguous magnetic environment with respect to the moving sensor carrier. The localization algorithm learns the unique patterns of the scattered magnets during training and detects them from the ongoing streaming of data during localization. Our contribution is twofold. First, we deploy passive permanent magnets that do not require a power supply, in contrast to active magnetic transmitters. Second, we perform localization based on smartphone motion rather than on static positioning of the magnetometer. In our previous study, we considered a single superstructure pattern. Here, we present an extended version of that algorithm for multi-superstructure localization, which covers a broader localization area of the user. Experimental results demonstrate localization accuracy of 95% with a mean localization error of less than 1m using artificial intelligence.

* year 2021
* Accepted to IEEE Transactions on Magnetics

Via

Access Paper or Ask Questions

Low power in-situ AI Calibration of a 3 Axial Magnetic Sensor

Jun 27, 2021

Roger Alimi, Elad Fisher, Amir Ivry, Alon Shavit, Eyal Weiss

Figure 1 for Low power in-situ AI Calibration of a 3 Axial Magnetic Sensor

Figure 2 for Low power in-situ AI Calibration of a 3 Axial Magnetic Sensor

Figure 3 for Low power in-situ AI Calibration of a 3 Axial Magnetic Sensor

Figure 4 for Low power in-situ AI Calibration of a 3 Axial Magnetic Sensor

Abstract:Magnetic surveys are conventionally performed by scanning a domain with a portable scalar magnetic sensor. Unfortunately, scalar magnetometers are expensive, power consuming and bulky. In many applications, calibrated vector magnetometers can be used to perform magnetic surveys. In recent years algorithms based on artificial intelligence (AI) achieve state-of-the-art results in many modern applications. In this work we investigate an AI algorithm for the classical scalar calibration of magnetometers. A simple, low cost method for performing a magnetic survey is presented. The method utilizes a low power consumption sensor with an AI calibration procedure that improves the common calibration methods and suggests an alternative to the conventional technology and algorithms. The setup of the survey system is optimized for quick deployment in-situ right before performing the magnetic survey. We present a calibration method based on a procedure of rotating the sensor in the natural earth magnetic field for an optimal time period. This technique can deal with a constant field offset and non-orthogonality issues and does not require any external reference. The calibration is done by finding an estimator that yields the calibration parameters and produces the best geometric fit to the sensor readings. A comprehensive model considering the physical, algorithmic and hardware properties of the magnetometer of the survey system is presented. The geometric ellipsoid fitting approach is parametrically tested. The calibration procedure reduced the root-mean-squared noise from the order of 104 nT to less than 10 nT with variance lower than 1 nT in a complete 360 degrees rotation in the natural earth magnetic field.

* vol. 55, no. 7, pp. 1-7, year 2019
* Accepted to IEEE Transactions On Magnetics

Via

Access Paper or Ask Questions

Machine Learning Detection Algorithm for Large Barkhausen Jumps in Cluttered Environment

Jun 27, 2021

Roger Alimi, Amir Ivry, Elad Fisher, Eyal Weiss

Figure 1 for Machine Learning Detection Algorithm for Large Barkhausen Jumps in Cluttered Environment

Figure 2 for Machine Learning Detection Algorithm for Large Barkhausen Jumps in Cluttered Environment

Figure 3 for Machine Learning Detection Algorithm for Large Barkhausen Jumps in Cluttered Environment

Figure 4 for Machine Learning Detection Algorithm for Large Barkhausen Jumps in Cluttered Environment

Abstract:Modern magnetic sensor arrays conventionally utilize state of the art low power magnetometers such as parallel and orthogonal fluxgates. Low power fluxgates tend to have large Barkhausen jumps that appear as a dc jump in the fluxgate output. This phenomenon deteriorates the signal fidelity and effectively increases the internal sensor noise. Even if sensors that are more prone to dc jumps can be screened during production, the conventional noise measurement does not always catch the dc jump because of its sparsity. Moreover, dc jumps persist in almost all the sensor cores although at a slower but still intolerable rate. Even if dc jumps can be easily detected in a shielded environment, when deployed in presence of natural noise and clutter, it can be hard to positively detect them. This work fills this gap and presents algorithms that distinguish dc jumps embedded in natural magnetic field data. To improve robustness to noise, we developed two machine learning algorithms that employ temporal and statistical physical-based features of a pre-acquired and well-known experimental data set. The first algorithm employs a support vector machine classifier, while the second is based on a neural network architecture. We compare these new approaches to a more classical kernel-based method. To that purpose, the receiver operating characteristic curve is generated, which allows diagnosis ability of the different classifiers by comparing their performances across various operation points. The accuracy of the machine learning-based algorithms over the classic method is highly emphasized. In addition, high generalization and robustness of the neural network can be concluded, based on the rapid convergence of the corresponding receiver operating characteristic curves.

* pp. 1-5, vol. 10, year 2019
* Accepted to IEEE Magnetics Letters

Via

Access Paper or Ask Questions

Voice Activity Detection for Transient Noisy Environment Based on Diffusion Nets

Jun 25, 2021

Amir Ivry, Baruch Berdugo, Israel Cohen

Figure 1 for Voice Activity Detection for Transient Noisy Environment Based on Diffusion Nets

Figure 2 for Voice Activity Detection for Transient Noisy Environment Based on Diffusion Nets

Figure 3 for Voice Activity Detection for Transient Noisy Environment Based on Diffusion Nets

Figure 4 for Voice Activity Detection for Transient Noisy Environment Based on Diffusion Nets

Abstract:We address voice activity detection in acoustic environments of transients and stationary noises, which often occur in real life scenarios. We exploit unique spatial patterns of speech and non-speech audio frames by independently learning their underlying geometric structure. This process is done through a deep encoder-decoder based neural network architecture. This structure involves an encoder that maps spectral features with temporal information to their low-dimensional representations, which are generated by applying the diffusion maps method. The encoder feeds a decoder that maps the embedded data back into the high-dimensional space. A deep neural network, which is trained to separate speech from non-speech frames, is obtained by concatenating the decoder to the encoder, resembling the known Diffusion nets architecture. Experimental results show enhanced performance compared to competing voice activity detection methods. The improvement is achieved in both accuracy, robustness and generalization ability. Our model performs in a real-time manner and can be integrated into audio-based communication systems. We also present a batch algorithm which obtains an even higher accuracy for off-line applications.

* volume 13, number 2, pp. 254--264, year 2019
* Accepted to IEEE journal of selected topics in signal processing 2019

Via

Access Paper or Ask Questions

Nonlinear Acoustic Echo Cancellation with Deep Learning

Jun 25, 2021

Amir Ivry, Israel Cohen, Baruch Berdugo

Figure 1 for Nonlinear Acoustic Echo Cancellation with Deep Learning

Figure 2 for Nonlinear Acoustic Echo Cancellation with Deep Learning

Figure 3 for Nonlinear Acoustic Echo Cancellation with Deep Learning

Figure 4 for Nonlinear Acoustic Echo Cancellation with Deep Learning

Abstract:We propose a nonlinear acoustic echo cancellation system, which aims to model the echo path from the far-end signal to the near-end microphone in two parts. Inspired by the physical behavior of modern hands-free devices, we first introduce a novel neural network architecture that is specifically designed to model the nonlinear distortions these devices induce between receiving and playing the far-end signal. To account for variations between devices, we construct this network with trainable memory length and nonlinear activation functions that are not parameterized in advance, but are rather optimized during the training stage using the training data. Second, the network is succeeded by a standard adaptive linear filter that constantly tracks the echo path between the loudspeaker output and the microphone. During training, the network and filter are jointly optimized to learn the network parameters. This system requires 17 thousand parameters that consume 500 Million floating-point operations per second and 40 Kilo-bytes of memory. It also satisfies hands-free communication timing requirements on a standard neural processor, which renders it adequate for embedding on hands-free communication devices. Using 280 hours of real and synthetic data, experiments show advantageous performance compared to competing methods.

* Accepted to Interspeech 2021

Via

Access Paper or Ask Questions