Abstract:Advances in deep learning have enabled the widespread deployment of speaker recognition systems (SRSs), yet they remain vulnerable to score-based impersonation attacks. Existing attacks that operate directly on raw waveforms require a large number of queries due to the difficulty of optimizing in high-dimensional audio spaces. Latent-space optimization within generative models offers improved efficiency, but these latent spaces are shaped by data distribution matching and do not inherently capture speaker-discriminative geometry. As a result, optimization trajectories often fail to align with the adversarial direction needed to maximize victim scores. To address this limitation, we propose an inversion-based generative attack framework that explicitly aligns the latent space of the synthesis model with the discriminative feature space of SRSs. We first analyze the requirements of an inverse model for score-based attacks and introduce a feature-aligned inversion strategy that geometrically synchronizes latent representations with speaker embeddings. This alignment ensures that latent updates directly translate into score improvements. Moreover, it enables new attack paradigms, including subspace-projection-based attacks, which were previously infeasible due to the absence of a faithful feature-to-audio mapping. Experiments show that our method significantly improves query efficiency, achieving competitive attack success rates with on average 10x fewer queries than prior approaches. In particular, the enabled subspace-projection-based attack attains up to 91.65% success using only 50 queries. These findings establish feature-aligned inversion as a key tool for evaluating the robustness of modern SRSs against score-based impersonation threats.
Abstract:As face recognition systems (FRS) become more widely used, user privacy becomes more important. A key privacy issue in FRS is protecting the user's face template, as the characteristics of the user's face image can be recovered from the template. Although recent advances in cryptographic tools such as homomorphic encryption (HE) have provided opportunities for securing the FRS, HE cannot be used directly with FRS in an efficient plug-and-play manner. In particular, although HE is functionally complete for arbitrary programs, it is basically designed for algebraic operations on encrypted data of predetermined shape, such as a polynomial ring. Thus, a non-tailored combination of HE and the system can yield very inefficient performance, and many previous HE-based face template protection methods are hundreds of times slower than plain systems without protection. In this study, we propose IDFace, a new HE-based secure and efficient face identification method with template protection. IDFace is designed on the basis of two novel techniques for efficient searching on a (homomorphically encrypted) biometric database with an angular metric. The first technique is a template representation transformation that sharply reduces the unit cost for the matching test. The second is a space-efficient encoding that reduces wasted space from the encryption algorithm, thus saving the number of operations on encrypted templates. Through experiments, we show that IDFace can identify a face template from among a database of 1M encrypted templates in 126ms, showing only 2X overhead compared to the identification over plaintexts.
Abstract:This study aims to explore user acceptance of Autonomous Vehicle (AV) policies with improved text-mining methods. Recently, South Korean policymakers have viewed Autonomous Driving Car (ADC) and Autonomous Driving Robot (ADR) as next-generation means of transportation that will reduce the cost of transporting passengers and goods. They support the construction of V2I and V2V communication infrastructures for ADC and recognize that ADR is equivalent to pedestrians to promote its deployment into sidewalks. To fill the gap where end-user acceptance of these policies is not well considered, this study applied two text-mining methods to the comments of graduate students in the fields of Industrial, Mechanical, and Electronics-Electrical-Computer. One is the Co-occurrence Network Analysis (CNA) based on TF-IWF and Dice coefficient, and the other is the Contextual Semantic Network Analysis (C-SNA) based on both KeyBERT, which extracts keywords that contextually represent the comments, and double cosine similarity. The reason for comparing these approaches is to balance interest not only in the implications for the AV policies but also in the need to apply quality text mining to this research domain. Significantly, the limitation of frequency-based text mining, which does not reflect textual context, and the trade-off of adjusting thresholds in Semantic Network Analysis (SNA) were considered. As the results of comparing the two approaches, the C-SNA provided the information necessary to understand users' voices using fewer nodes and features than the CNA. The users who pre-emptively understood the AV policies based on their engineering literacy and the given texts revealed potential risks of the AV accident policies. This study adds suggestions to manage these risks to support the successful deployment of AVs on public roads.