Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Zhibin Lin

Decoding Speech Envelopes from Electroencephalogram with a Contrastive Pearson Correlation Coefficient Loss

Jan 29, 2026

Yayun Liang, Yuanming Zhang, Fei Chen, Jing Lu, Zhibin Lin

Abstract:Recent advances in reconstructing speech envelopes from Electroencephalogram (EEG) signals have enabled continuous auditory attention decoding (AAD) in multi-speaker environments. Most Deep Neural Network (DNN)-based envelope reconstruction models are trained to maximize the Pearson correlation coefficients (PCC) between the attended envelope and the reconstructed envelope (attended PCC). While the difference between the attended PCC and the unattended PCC plays an essential role in auditory attention decoding, existing methods often focus on maximizing the attended PCC. We therefore propose a contrastive PCC loss which represents the difference between the attended PCC and the unattended PCC. The proposed approach is evaluated on three public EEG AAD datasets using four DNN architectures. Across many settings, the proposed objective improves envelope separability and AAD accuracy, while also revealing dataset- and architecture-dependent failure cases.

Via

Access Paper or Ask Questions

Auditory Attention Decoding from Ear-EEG Signals: A Dataset with Dynamic Attention Switching and Rigorous Cross-Validation

Oct 22, 2025

Yuanming Zhang, Zeyan Song, Jing Lu, Fei Chen, Zhibin Lin

Abstract:Recent promising results in auditory attention decoding (AAD) using scalp electroencephalography (EEG) have motivated the exploration of cEEGrid, a flexible and portable ear-EEG system. While prior cEEGrid-based studies have confirmed the feasibility of AAD, they often neglect the dynamic nature of attentional states in real-world contexts. To address this gap, a novel cEEGrid dataset featuring three concurrent speakers distributed across three of five distinct spatial locations is introduced. The novel dataset is designed to probe attentional tracking and switching in realistic scenarios. Nested leave-one-out validation-an approach more rigorous than conventional single-loop leave-one-out validation-is employed to reduce biases stemming from EEG's intricate temporal dynamics. Four rule-based models are evaluated: Wiener filter (WF), canonical component analysis (CCA), common spatial pattern (CSP) and Riemannian Geometry-based classifier (RGC). With a 30-second decision window, WF and CCA models achieve decoding accuracies of 41.5% and 41.4%, respectively, while CSP and RGC models yield 37.8% and 37.6% accuracies using a 10-second window. Notably, both WF and CCA successfully track attentional state switches across all experimental tasks. Additionally, higher decoding accuracies are observed for electrodes positioned at the upper cEEGrid layout and near the listener's right ear. These findings underscore the utility of dynamic, ecologically valid paradigms and rigorous validation in advancing AAD research with cEEGrid.

Via

Access Paper or Ask Questions

A Lightweight Hybrid Dual Channel Speech Enhancement System under Low-SNR Conditions

May 26, 2025

Zheng Wang, Xiaobin Rong, Yu Sun, Tianchi Sun, Zhibin Lin, Jing Lu

Figure 1 for A Lightweight Hybrid Dual Channel Speech Enhancement System under Low-SNR Conditions

Figure 2 for A Lightweight Hybrid Dual Channel Speech Enhancement System under Low-SNR Conditions

Figure 3 for A Lightweight Hybrid Dual Channel Speech Enhancement System under Low-SNR Conditions

Figure 4 for A Lightweight Hybrid Dual Channel Speech Enhancement System under Low-SNR Conditions

Abstract:Although deep learning based multi-channel speech enhancement has achieved significant advancements, its practical deployment is often limited by constrained computational resources, particularly in low signal-to-noise ratio (SNR) conditions. In this paper, we propose a lightweight hybrid dual-channel speech enhancement system that combines independent vector analysis (IVA) with a modified version of the dual-channel grouped temporal convolutional recurrent network (GTCRN). IVA functions as a coarse estimator, providing auxiliary information for both speech and noise, while the modified GTCRN further refines the speech quality. We investigate several modifications to ensure the comprehensive utilization of both original and auxiliary information. Experimental results demonstrate the effectiveness of the proposed system, achieving enhanced speech with minimal parameters and low computational complexity.

* Accepted by Interspeech 2025

Via

Access Paper or Ask Questions

Electroencephalogram-based Multi-class Decoding of Attended Speakers' Direction with Audio Spatial Spectrum

Nov 11, 2024

Yuanming Zhang, Jing Lu, Zhibin Lin, Fei Chen, Haoliang Du, Xia Gao

Figure 1 for Electroencephalogram-based Multi-class Decoding of Attended Speakers' Direction with Audio Spatial Spectrum

Figure 2 for Electroencephalogram-based Multi-class Decoding of Attended Speakers' Direction with Audio Spatial Spectrum

Figure 3 for Electroencephalogram-based Multi-class Decoding of Attended Speakers' Direction with Audio Spatial Spectrum

Figure 4 for Electroencephalogram-based Multi-class Decoding of Attended Speakers' Direction with Audio Spatial Spectrum

Abstract:Decoding the directional focus of an attended speaker from listeners' electroencephalogram (EEG) signals is essential for developing brain-computer interfaces to improve the quality of life for individuals with hearing impairment. Previous works have concentrated on binary directional focus decoding, i.e., determining whether the attended speaker is on the left or right side of the listener. However, a more precise decoding of the exact direction of the attended speaker is necessary for effective speech processing. Additionally, audio spatial information has not been effectively leveraged, resulting in suboptimal decoding results. In this paper, we observe that, on our recently presented dataset with 15-class directional focus, models relying exclusively on EEG inputs exhibits significantly lower accuracy when decoding the directional focus in both leave-one-subject-out and leave-one-trial-out scenarios. By integrating audio spatial spectra with EEG features, the decoding accuracy can be effectively improved. We employ the CNN, LSM-CNN, and EEG-Deformer models to decode the directional focus from listeners' EEG signals with the auxiliary audio spatial spectra. The proposed Sp-Aux-Deformer model achieves notable 15-class decoding accuracies of 57.48% and 61.83% in leave-one-subject-out and leave-one-trial-out scenarios, respectively.

Via

Access Paper or Ask Questions