Alert button
Picture for Dazhi Gao

Dazhi Gao

Alert button

Dual-path Transformer Based Neural Beamformer for Target Speech Extraction

Sep 07, 2023
Aoqi Guo, Sichong Qian, Baoxiang Li, Dazhi Gao

Figure 1 for Dual-path Transformer Based Neural Beamformer for Target Speech Extraction
Figure 2 for Dual-path Transformer Based Neural Beamformer for Target Speech Extraction
Figure 3 for Dual-path Transformer Based Neural Beamformer for Target Speech Extraction
Figure 4 for Dual-path Transformer Based Neural Beamformer for Target Speech Extraction

Neural beamformers, which integrate both pre-separation and beamforming modules, have demonstrated impressive effectiveness in target speech extraction. Nevertheless, the performance of these beamformers is inherently limited by the predictive accuracy of the pre-separation module. In this paper, we introduce a neural beamformer supported by a dual-path transformer. Initially, we employ the cross-attention mechanism in the time domain to extract crucial spatial information related to beamforming from the noisy covariance matrix. Subsequently, in the frequency domain, the self-attention mechanism is employed to enhance the model's ability to process frequency-specific details. By design, our model circumvents the influence of pre-separation modules, delivering performance in a more comprehensive end-to-end manner. Experimental results reveal that our model not only outperforms contemporary leading neural beamforming algorithms in separation performance but also achieves this with a significant reduction in parameter count.

Viaarxiv icon

Enhanced Neural Beamformer with Spatial Information for Target Speech Extraction

Jun 28, 2023
Aoqi Guo, Junnan Wu, Peng Gao, Wenbo Zhu, Qinwen Guo, Dazhi Gao, Yujun Wang

Figure 1 for Enhanced Neural Beamformer with Spatial Information for Target Speech Extraction
Figure 2 for Enhanced Neural Beamformer with Spatial Information for Target Speech Extraction
Figure 3 for Enhanced Neural Beamformer with Spatial Information for Target Speech Extraction
Figure 4 for Enhanced Neural Beamformer with Spatial Information for Target Speech Extraction

Recently, deep learning-based beamforming algorithms have shown promising performance in target speech extraction tasks. However, most systems do not fully utilize spatial information. In this paper, we propose a target speech extraction network that utilizes spatial information to enhance the performance of neural beamformer. To achieve this, we first use the UNet-TCN structure to model input features and improve the estimation accuracy of the speech pre-separation module by avoiding information loss caused by direct dimensionality reduction in other models. Furthermore, we introduce a multi-head cross-attention mechanism that enhances the neural beamformer's perception of spatial information by making full use of the spatial information received by the array. Experimental results demonstrate that our approach, which incorporates a more reasonable target mask estimation network and a spatial information-based cross-attention mechanism into the neural beamformer, effectively improves speech separation performance.

Viaarxiv icon

Dynamic Interactional And Cooperative Network For Shield Machine

Nov 17, 2022
Dazhi Gao, Rongyang Li, Hongbo Wang, Lingfeng Mao, Huansheng Ning

Figure 1 for Dynamic Interactional And Cooperative Network For Shield Machine
Figure 2 for Dynamic Interactional And Cooperative Network For Shield Machine
Figure 3 for Dynamic Interactional And Cooperative Network For Shield Machine
Figure 4 for Dynamic Interactional And Cooperative Network For Shield Machine

The shield machine (SM) is a complex mechanical device used for tunneling. However, the monitoring and deciding were mainly done by artificial experience during traditional construction, which brought some limitations, such as hidden mechanical failures, human operator error, and sensor anomalies. To deal with these challenges, many scholars have studied SM intelligent methods. Most of these methods only take SM into account but do not consider the SM operating environment. So, this paper discussed the relationship among SM, geological information, and control terminals. Then, according to the relationship, models were established for the control terminal, including SM rate prediction and SM anomaly detection. The experimental results show that compared with baseline models, the proposed models in this paper perform better. In the proposed model, the R2 and MSE of rate prediction can reach 92.2\%, and 0.0064 respectively. The abnormal detection rate of anomaly detection is up to 98.2\%.

Viaarxiv icon

Training a U-Net based on a random mode-coupling matrix model to recover acoustic interference striations

Mar 24, 2020
Xiaolei Li, Wenhua Song, Dazhi Gao, Wei Gao, Haozhong Wan

A U-Net is trained to recover acoustic interference striations (AISs) from distorted ones. A random mode-coupling matrix model is introduced to generate a large number of training data quickly, which are used to train the U-Net. The performance of AIS recovery of the U-Net is tested in range-dependent waveguides with nonlinear internal waves (NLIWs). Although the random mode-coupling matrix model is not an accurate physical model, the test results show that the U-Net successfully recovers AISs under different signal-to-noise ratios (SNRs) and different amplitudes and widths of NLIWs for different shapes.

Viaarxiv icon

Sound source ranging using a feed-forward neural network with fitting-based early stopping

Apr 01, 2019
Jing Chi, Xiaolei Li, Haozhong Wang, Dazhi Gao, Peter Gerstoft

Figure 1 for Sound source ranging using a feed-forward neural network with fitting-based early stopping
Figure 2 for Sound source ranging using a feed-forward neural network with fitting-based early stopping
Figure 3 for Sound source ranging using a feed-forward neural network with fitting-based early stopping

When a feed-forward neural network (FNN) is trained for source ranging in an ocean waveguide, it is difficult evaluating the range accuracy of the FNN on unlabeled test data. A fitting-based early stopping (FEAST) method is introduced to evaluate the range error of the FNN on test data where the distance of source is unknown. Based on FEAST, when the evaluated range error of the FNN reaches the minimum on test data, stopping training, which will help to improve the ranging accuracy of the FNN on the test data. The FEAST is demonstrated on simulated and experimental data.

Viaarxiv icon