Abstract:This work studies electrocardiogram (ECG) biometrics at large scale, directly addressing a critical gap in the literature: the scarcity of large-scale evaluations with operational metrics and protocols that enable meaningful standardization and comparison across studies. We show that identity information is already present in tabular representations (fiducial features): even a simple MLP-based embedding network yields non-trivial performance, establishing a strong baseline before waveform modeling. We then adopt embedding-based deep learning models (ArcFace), first on features and then on ECG waveforms, showing a clear performance jump when moving from tabular inputs to waveforms, and a further gain with larger training sets and consistent normalization across train/val/test. On a large-scale test set, verification achieves high TAR at strict FAR thresholds (TAR=0.908 @ FAR=1e-3; TAR=0.820 @ FAR=1e-4) with EER=2.53\% (all-vs-all); closed-set identification yields Rank@1=0.812 and Rank@10=0.910. In open-set, a two-stage pipeline (top-$K$ shortlist on embeddings + re-ranking) reaches DIR@FAR up to 0.976 at FAR=1e-3 and 1e-4. Overall, the results show that ECG carries a measurable individual signature and that large-scale testing is essential to obtain realistic, comparable metrics. The study provides an operationally grounded benchmark that helps standardize evaluation across protocols.
Abstract:This work studies electrocardiogram (ECG) biometrics at large scale, evaluating how strongly an ECG can be linked to an individual and, consequently, how its anonymization may be compromised. We show that identity information is already present in tabular representations (fiducial features): even a simple MLP-based embedding network yields non-trivial performance, indicating that anonymization based solely on releasing features does not guarantee privacy. We then adopt embedding-based deep learning models (ArcFace), first on features and then on ECG waveforms, showing a performance jump when moving from tabular inputs to waveforms, and a further gain with larger training sets and consistent normalization across train/val/test. On a large-scale test set, verification achieves high TAR at strict FAR thresholds (TAR=0.908 @ FAR=1e-3; TAR=0.820 @ FAR=1e-4) with EER=2.53% (all-vs-all); closed-set identification yields Rank@1=0.812 and Rank@10=0.910. In open-set, a two-stage pipeline (top-K shortlist on embeddings + re-ranking) reaches DIR@FAR up to 0.976 at FAR=1e-3 and 1e-4. Overall, the results show that ECG carries a measurable individual signature: re-identification is already possible with tabular features and is further amplified by embedding-based models, making privacy implications and realistic operational protocols essential to consider.