Wichita State University
Abstract:This paper studies the impact of retrieved ideological texts on the outputs of large language models (LLMs). While interest in understanding ideology in LLMs has recently increased, little attention has been given to this issue in the context of Retrieval-Augmented Generation (RAG). To fill this gap, we design an external knowledge source based on ideological loaded texts about COVID-19 treatments. Our corpus is based on 1,117 academic articles representing discourses about controversial and endorsed treatments for the disease. We propose a corpus linguistics framework, based on Lexical Multidimensional Analysis (LMDA), to identify the ideologies within the corpus. LLMs are tasked to answer questions derived from three identified ideological dimensions, and two types of contextual prompts are adopted: the first comprises the user question and ideological texts; and the second contains the question, ideological texts, and LMDA descriptions. Ideological alignment between reference ideological texts and LLMs' responses is assessed using cosine similarity for lexical and semantic representations. Results demonstrate that LLMs' responses based on ideological retrieved texts are more aligned with the ideology encountered in the external knowledge, with the enhanced prompt further influencing LLMs' outputs. Our findings highlight the importance of identifying ideological discourses within the RAG framework in order to mitigate not just unintended ideological bias, but also the risks of malicious manipulation of such models.
Abstract:Logical Access (LA) attacks, also known as audio deepfake attacks, use Text-to-Speech (TTS) or Voice Conversion (VC) methods to generate spoofed speech data. This can represent a serious threat to Automatic Speaker Verification (ASV) systems, as intruders can use such attacks to bypass voice biometric security. In this study, we investigate the correlation between speech quality and the performance of audio spoofing detection systems (i.e., LA task). For that, the performance of two enhancement algorithms is evaluated based on two perceptual speech quality measures, namely Perceptual Evaluation of Speech Quality (PESQ) and Speech-to-Reverberation Modulation Ratio (SRMR), and in respect to their impact on the audio spoofing detection system. We adopted the LA dataset, provided in the ASVspoof 2019 Challenge, and corrupted its test set with different Signal-to-Noise Ratio (SNR) levels, while leaving the training data untouched. Enhancement was applied to attenuate the detrimental effects of noisy speech, and the performances of two models, Speech Enhancement Generative Adversarial Network (SEGAN) and Metric-Optimized Generative Adversarial Network Plus (MetricGAN+), were compared. Although we expect that speech quality will correlate well with speech applications' performance, it can also have as a side effect on downstream tasks if unwanted artifacts are introduced or relevant information is removed from the speech signal. Our results corroborate with this hypothesis, as we found that the enhancement algorithm leading to the highest speech quality scores, MetricGAN+, provided the lowest Equal Error Rate (EER) on the audio spoofing detection task, whereas the enhancement method with the lowest speech quality scores, SEGAN, led to the lowest EER, thus leading to better performance on the LA task.
Abstract:Rapid structural damage assessment from remote sensing imagery is essential for timely disaster response. Within human-machine systems (HMS) for disaster management, automated damage detection provides decision-makers with actionable situational awareness. However, models trained on multi-disaster benchmarks often underperform in unseen geographic regions due to domain shift - a distributional mismatch between training and deployment data that undermines human trust in automated assessments. We explore a two-stage ensemble approach using supervised domain adaptation (SDA) for building damage classification across four severity classes. The pipeline adapts the xView2 first-place method to the Ida-BD dataset using SDA and systematically investigates the effect of individual augmentation components on classification performance. Comprehensive ablation experiments on the unseen Ida-BD test split demonstrate that SDA is indispensable: removing it causes damage detection to fail entirely. Our pipeline achieves the most robust performance using SDA with unsharp-enhanced RGB input, attaining a Macro-F1 of 0.5552. These results underscore the critical role of domain adaptation in building trustworthy automated damage assessment modules for HMS-integrated disaster response.
Abstract:Audio deepfake detection aims to detect real human voices from those generated by Artificial Intelligence (AI) and has emerged as a significant problem in the field of voice biometrics systems. With the ever-improving quality of synthetic voice, the probability of such a voice being exploited for illicit practices like identity thest and impersonation increases. Although significant progress has been made in the field of Audio Deepfake Detection in recent times, the issue of gender bias remains underexplored and in its nascent stage In this paper, we have attempted a thorough analysis of gender dependent performance and fairness in audio deepfake detection models. We have used the ASVspoof 5 dataset and train a ResNet-18 classifier and evaluate detection performance across four different audio features, and compared the performance with baseline AASIST model. Beyond conventional metrics such as Equal Error Rate (EER %), we incorporated five established fairness metrics to quantify gender disparities in the model. Our results show that even when the overall EER difference between genders appears low, fairness-aware evaluation reveals disparities in error distribution that are obscured by aggregate performance measures. These findings demonstrate that reliance on standard metrics is unreliable, whereas fairness metrics provide critical insights into demographic-specific failure modes. This work highlights the importance of fairness-aware evaluation for developing a more equitable, robust, and trustworthy audio deepfake detection system.