Alert button
Picture for Amir Ivry

Amir Ivry

Alert button

Deep Learning Interviews: Hundreds of fully solved job interview questions from a wide range of key topics in AI

Jan 04, 2022
Shlomo Kashani, Amir Ivry

The second edition of Deep Learning Interviews is home to hundreds of fully-solved problems, from a wide range of key topics in AI. It is designed to both rehearse interview or exam specific topics and provide machine learning MSc / PhD. students, and those awaiting an interview a well-organized overview of the field. The problems it poses are tough enough to cut your teeth on and to dramatically improve your skills-but they're framed within thought-provoking questions and engaging stories. That is what makes the volume so specifically valuable to students and job seekers: it provides them with the ability to speak confidently and quickly on any relevant topic, to answer technical questions clearly and correctly, and to fully understand the purpose and meaning of interview questions and answers. Those are powerful, indispensable advantages to have when walking into the interview room. The book's contents is a large inventory of numerous topics relevant to DL job interviews and graduate level exams. That places this work at the forefront of the growing trend in science to teach a core set of practical mathematical and computational skills. It is widely accepted that the training of every computer scientist must include the fundamental theorems of ML, and AI appears in the curriculum of nearly every university. This volume is designed as an excellent reference for graduates of such programs.

Viaarxiv icon

Objective Metrics to Evaluate Residual-Echo Suppression During Double-Talk

Jul 15, 2021
Amir Ivry, Israel Cohen, Baruch Berdugo

Figure 1 for Objective Metrics to Evaluate Residual-Echo Suppression During Double-Talk
Figure 2 for Objective Metrics to Evaluate Residual-Echo Suppression During Double-Talk
Figure 3 for Objective Metrics to Evaluate Residual-Echo Suppression During Double-Talk
Figure 4 for Objective Metrics to Evaluate Residual-Echo Suppression During Double-Talk

Human subjective evaluation is optimal to assess speech quality for human perception. The recently introduced deep noise suppression mean opinion score (DNSMOS) metric was shown to estimate human ratings with great accuracy. The signal-to-distortion ratio (SDR) metric is widely used to evaluate residual-echo suppression (RES) systems by estimating speech quality during double-talk. However, since the SDR is affected by both speech distortion and residual-echo presence, it does not correlate well with human ratings according to the DNSMOS. To address that, we introduce two objective metrics to separately quantify the desired-speech maintained level (DSML) and residual-echo suppression level (RESL) during double-talk. These metrics are evaluated using a deep learning-based RES-system with a tunable design parameter. Using 280 hours of real and simulated recordings, we show that the DSML and RESL correlate well with the DNSMOS with high generalization to various setups. Also, we empirically investigate the relation between tuning the RES-system design parameter and the DSML-RESL tradeoff it creates and offer a practical design scheme for dynamic system requirements.

* Accepted to WASPAA 
Viaarxiv icon

Multiclass Permanent Magnets Superstructure for Indoor Localization using Artificial Intelligence

Jul 14, 2021
Amir Ivry, Elad Fisher, Roger Alimi, Idan Mosseri, Kanna Nahir

Figure 1 for Multiclass Permanent Magnets Superstructure for Indoor Localization using Artificial Intelligence
Figure 2 for Multiclass Permanent Magnets Superstructure for Indoor Localization using Artificial Intelligence
Figure 3 for Multiclass Permanent Magnets Superstructure for Indoor Localization using Artificial Intelligence
Figure 4 for Multiclass Permanent Magnets Superstructure for Indoor Localization using Artificial Intelligence

Smartphones have become a popular tool for indoor localization and position estimation of users. Existing solutions mainly employ Wi-Fi, RFID, and magnetic sensing techniques to track movements in crowded venues. These are highly sensitive to magnetic clutters and depend on local ambient magnetic fields, which frequently degrades their performance. Also, these techniques often require pre-known mapping surveys of the area, or the presence of active beacons, which are not always available. We embed small-volume and large-moment magnets in pre-known locations and arrange them in specific geometric constellations that create magnetic superstructure patterns of supervised magnetic signatures. These signatures constitute an unambiguous magnetic environment with respect to the moving sensor carrier. The localization algorithm learns the unique patterns of the scattered magnets during training and detects them from the ongoing streaming of data during localization. Our contribution is twofold. First, we deploy passive permanent magnets that do not require a power supply, in contrast to active magnetic transmitters. Second, we perform localization based on smartphone motion rather than on static positioning of the magnetometer. In our previous study, we considered a single superstructure pattern. Here, we present an extended version of that algorithm for multi-superstructure localization, which covers a broader localization area of the user. Experimental results demonstrate localization accuracy of 95% with a mean localization error of less than 1m using artificial intelligence.

* year 2021  
* Accepted to IEEE Transactions on Magnetics 
Viaarxiv icon

Low power in-situ AI Calibration of a 3 Axial Magnetic Sensor

Jun 27, 2021
Roger Alimi, Elad Fisher, Amir Ivry, Alon Shavit, Eyal Weiss

Figure 1 for Low power in-situ AI Calibration of a 3 Axial Magnetic Sensor
Figure 2 for Low power in-situ AI Calibration of a 3 Axial Magnetic Sensor
Figure 3 for Low power in-situ AI Calibration of a 3 Axial Magnetic Sensor
Figure 4 for Low power in-situ AI Calibration of a 3 Axial Magnetic Sensor

Magnetic surveys are conventionally performed by scanning a domain with a portable scalar magnetic sensor. Unfortunately, scalar magnetometers are expensive, power consuming and bulky. In many applications, calibrated vector magnetometers can be used to perform magnetic surveys. In recent years algorithms based on artificial intelligence (AI) achieve state-of-the-art results in many modern applications. In this work we investigate an AI algorithm for the classical scalar calibration of magnetometers. A simple, low cost method for performing a magnetic survey is presented. The method utilizes a low power consumption sensor with an AI calibration procedure that improves the common calibration methods and suggests an alternative to the conventional technology and algorithms. The setup of the survey system is optimized for quick deployment in-situ right before performing the magnetic survey. We present a calibration method based on a procedure of rotating the sensor in the natural earth magnetic field for an optimal time period. This technique can deal with a constant field offset and non-orthogonality issues and does not require any external reference. The calibration is done by finding an estimator that yields the calibration parameters and produces the best geometric fit to the sensor readings. A comprehensive model considering the physical, algorithmic and hardware properties of the magnetometer of the survey system is presented. The geometric ellipsoid fitting approach is parametrically tested. The calibration procedure reduced the root-mean-squared noise from the order of 104 nT to less than 10 nT with variance lower than 1 nT in a complete 360 degrees rotation in the natural earth magnetic field.

* vol. 55, no. 7, pp. 1-7, year 2019  
* Accepted to IEEE Transactions On Magnetics 
Viaarxiv icon

Machine Learning Detection Algorithm for Large Barkhausen Jumps in Cluttered Environment

Jun 27, 2021
Roger Alimi, Amir Ivry, Elad Fisher, Eyal Weiss

Figure 1 for Machine Learning Detection Algorithm for Large Barkhausen Jumps in Cluttered Environment
Figure 2 for Machine Learning Detection Algorithm for Large Barkhausen Jumps in Cluttered Environment
Figure 3 for Machine Learning Detection Algorithm for Large Barkhausen Jumps in Cluttered Environment
Figure 4 for Machine Learning Detection Algorithm for Large Barkhausen Jumps in Cluttered Environment

Modern magnetic sensor arrays conventionally utilize state of the art low power magnetometers such as parallel and orthogonal fluxgates. Low power fluxgates tend to have large Barkhausen jumps that appear as a dc jump in the fluxgate output. This phenomenon deteriorates the signal fidelity and effectively increases the internal sensor noise. Even if sensors that are more prone to dc jumps can be screened during production, the conventional noise measurement does not always catch the dc jump because of its sparsity. Moreover, dc jumps persist in almost all the sensor cores although at a slower but still intolerable rate. Even if dc jumps can be easily detected in a shielded environment, when deployed in presence of natural noise and clutter, it can be hard to positively detect them. This work fills this gap and presents algorithms that distinguish dc jumps embedded in natural magnetic field data. To improve robustness to noise, we developed two machine learning algorithms that employ temporal and statistical physical-based features of a pre-acquired and well-known experimental data set. The first algorithm employs a support vector machine classifier, while the second is based on a neural network architecture. We compare these new approaches to a more classical kernel-based method. To that purpose, the receiver operating characteristic curve is generated, which allows diagnosis ability of the different classifiers by comparing their performances across various operation points. The accuracy of the machine learning-based algorithms over the classic method is highly emphasized. In addition, high generalization and robustness of the neural network can be concluded, based on the rapid convergence of the corresponding receiver operating characteristic curves.

* pp. 1-5, vol. 10, year 2019  
* Accepted to IEEE Magnetics Letters 
Viaarxiv icon

Voice Activity Detection for Transient Noisy Environment Based on Diffusion Nets

Jun 25, 2021
Amir Ivry, Baruch Berdugo, Israel Cohen

Figure 1 for Voice Activity Detection for Transient Noisy Environment Based on Diffusion Nets
Figure 2 for Voice Activity Detection for Transient Noisy Environment Based on Diffusion Nets
Figure 3 for Voice Activity Detection for Transient Noisy Environment Based on Diffusion Nets
Figure 4 for Voice Activity Detection for Transient Noisy Environment Based on Diffusion Nets

We address voice activity detection in acoustic environments of transients and stationary noises, which often occur in real life scenarios. We exploit unique spatial patterns of speech and non-speech audio frames by independently learning their underlying geometric structure. This process is done through a deep encoder-decoder based neural network architecture. This structure involves an encoder that maps spectral features with temporal information to their low-dimensional representations, which are generated by applying the diffusion maps method. The encoder feeds a decoder that maps the embedded data back into the high-dimensional space. A deep neural network, which is trained to separate speech from non-speech frames, is obtained by concatenating the decoder to the encoder, resembling the known Diffusion nets architecture. Experimental results show enhanced performance compared to competing voice activity detection methods. The improvement is achieved in both accuracy, robustness and generalization ability. Our model performs in a real-time manner and can be integrated into audio-based communication systems. We also present a batch algorithm which obtains an even higher accuracy for off-line applications.

* volume 13, number 2, pp. 254--264, year 2019  
* Accepted to IEEE journal of selected topics in signal processing 2019 
Viaarxiv icon

Nonlinear Acoustic Echo Cancellation with Deep Learning

Jun 25, 2021
Amir Ivry, Israel Cohen, Baruch Berdugo

Figure 1 for Nonlinear Acoustic Echo Cancellation with Deep Learning
Figure 2 for Nonlinear Acoustic Echo Cancellation with Deep Learning
Figure 3 for Nonlinear Acoustic Echo Cancellation with Deep Learning
Figure 4 for Nonlinear Acoustic Echo Cancellation with Deep Learning

We propose a nonlinear acoustic echo cancellation system, which aims to model the echo path from the far-end signal to the near-end microphone in two parts. Inspired by the physical behavior of modern hands-free devices, we first introduce a novel neural network architecture that is specifically designed to model the nonlinear distortions these devices induce between receiving and playing the far-end signal. To account for variations between devices, we construct this network with trainable memory length and nonlinear activation functions that are not parameterized in advance, but are rather optimized during the training stage using the training data. Second, the network is succeeded by a standard adaptive linear filter that constantly tracks the echo path between the loudspeaker output and the microphone. During training, the network and filter are jointly optimized to learn the network parameters. This system requires 17 thousand parameters that consume 500 Million floating-point operations per second and 40 Kilo-bytes of memory. It also satisfies hands-free communication timing requirements on a standard neural processor, which renders it adequate for embedding on hands-free communication devices. Using 280 hours of real and synthetic data, experiments show advantageous performance compared to competing methods.

* Accepted to Interspeech 2021 
Viaarxiv icon

Deep Residual Echo Suppression with A Tunable Tradeoff Between Signal Distortion and Echo Suppression

Jun 25, 2021
Amir Ivry, Israel Cohen, Baruch Berdugo

Figure 1 for Deep Residual Echo Suppression with A Tunable Tradeoff Between Signal Distortion and Echo Suppression
Figure 2 for Deep Residual Echo Suppression with A Tunable Tradeoff Between Signal Distortion and Echo Suppression
Figure 3 for Deep Residual Echo Suppression with A Tunable Tradeoff Between Signal Distortion and Echo Suppression
Figure 4 for Deep Residual Echo Suppression with A Tunable Tradeoff Between Signal Distortion and Echo Suppression

In this paper, we propose a residual echo suppression method using a UNet neural network that directly maps the outputs of a linear acoustic echo canceler to the desired signal in the spectral domain. This system embeds a design parameter that allows a tunable tradeoff between the desired-signal distortion and residual echo suppression in double-talk scenarios. The system employs 136 thousand parameters, and requires 1.6 Giga floating-point operations per second and 10 Mega-bytes of memory. The implementation satisfies both the timing requirements of the AEC challenge and the computational and memory limitations of on-device applications. Experiments are conducted with 161~h of data from the AEC challenge database and from real independent recordings. We demonstrate the performance of the proposed system in real-life conditions and compare it with two competing methods regarding echo suppression and desired-signal distortion, generalization to various environments, and robustness to high echo levels.

* pp. 126--130, year 2021  
* Accepted to ICASSP 2021 
Viaarxiv icon

Evaluation of Deep-Learning-Based Voice Activity Detectors and Room Impulse Response Models in Reverberant Environments

Jun 25, 2021
Amir Ivry, Israel Cohen, Baruch Berdugo

Figure 1 for Evaluation of Deep-Learning-Based Voice Activity Detectors and Room Impulse Response Models in Reverberant Environments
Figure 2 for Evaluation of Deep-Learning-Based Voice Activity Detectors and Room Impulse Response Models in Reverberant Environments
Figure 3 for Evaluation of Deep-Learning-Based Voice Activity Detectors and Room Impulse Response Models in Reverberant Environments

State-of-the-art deep-learning-based voice activity detectors (VADs) are often trained with anechoic data. However, real acoustic environments are generally reverberant, which causes the performance to significantly deteriorate. To mitigate this mismatch between training data and real data, we simulate an augmented training set that contains nearly five million utterances. This extension comprises of anechoic utterances and their reverberant modifications, generated by convolutions of the anechoic utterances with a variety of room impulse responses (RIRs). We consider five different models to generate RIRs, and five different VADs that are trained with the augmented training set. We test all trained systems in three different real reverberant environments. Experimental results show $20\%$ increase on average in accuracy, precision and recall for all detectors and response models, compared to anechoic training. Furthermore, one of the RIR models consistently yields better performance than the other models, for all the tested VADs. Additionally, one of the VADs consistently outperformed the other VADs in all experiments.

* Accepted to ICASSP 2020 
Viaarxiv icon