University of Leeds
Abstract:The detection of sexism in online content remains an open problem, as harmful language disproportionately affects women and marginalized groups. While automated systems for sexism detection have been developed, they still face two key challenges: data sparsity and the nuanced nature of sexist language. Even in large, well-curated datasets like the Explainable Detection of Online Sexism (EDOS), severe class imbalance hinders model generalization. Additionally, the overlapping and ambiguous boundaries of fine-grained categories introduce substantial annotator disagreement, reflecting the difficulty of interpreting nuanced expressions of sexism. To address these challenges, we propose two prompt-based data augmentation techniques: Definition-based Data Augmentation (DDA), which leverages category-specific definitions to generate semantically-aligned synthetic examples, and Contextual Semantic Expansion (CSE), which targets systematic model errors by enriching examples with task-specific semantic features. To further improve reliability in fine-grained classification, we introduce an ensemble strategy that resolves prediction ties by aggregating complementary perspectives from multiple language models. Our experimental evaluation on the EDOS dataset demonstrates state-of-the-art performance across all tasks, with notable improvements of macro F1 by 1.5 points for binary classification (Task A) and 4.1 points for fine-grained classification (Task C).
Abstract:Federated Learning (FL) enables collaborative model training while preserving privacy by allowing clients to share model updates instead of raw data. Pervasive computing environments (e.g., for Human Activity Recognition, HAR), which we focus on in this paper, are characterized by resource-constrained end devices, streaming sensor data and intermittent client participation. Variations in user behavior, common in HAR environments, often result in non-stationary data distributions. As such, existing FL approaches face challenges in HAR settings due to differing assumptions. The combined effects of HAR characteristics, namely heterogeneous data and intermittent participation, can lead to a severe issue called catastrophic forgetting (CF). Unlike Continuous Learning (CL), which addresses CF using memory and replay mechanisms, FL's privacy constraints prohibit such strategies. To tackle CF in HAR environments, we propose FlexFed, a novel FL approach that prioritizes data retention for efficient memory use and dynamically adjusts offline training frequency based on distribution shifts, client capability and offline duration. To better quantify CF in FL, we introduce a new metric that accounts for under-represented data, enabling more accurate evaluations. We also develop a realistic HAR-based evaluation framework that simulates streaming data, dynamic distributions, imbalances and varying availability. Experiments show that FlexFed mitigates CF more effectively, improves FL efficiency by 10 to 15 % and achieves faster, more stable convergence, especially for infrequent or under-represented data.
Abstract:Federated Learning (FL) is a privacy-preserving machine learning technique that allows decentralized collaborative model training across a set of distributed clients, by avoiding raw data exchange. A fundamental component of FL is the selection of a subset of clients in each round for model training by a central server. Current selection strategies are myopic in nature in that they are based on past or current interactions, often leading to inefficiency issues such as straggling clients. In this paper, we address this serious shortcoming by proposing the RIFLES approach that builds a novel availability forecasting layer to support the client selection process. We make the following contributions: (i) we formalise the sequential selection problem and reduce it to a scheduling problem and show that the problem is NP-complete, (ii) leveraging heartbeat messages from clients, RIFLES build an availability prediction layer to support (long term) selection decisions, (iii) we propose a novel adaptive selection strategy to support efficient learning and resource usage. To circumvent the inherent exponential complexity, we present RIFLES, a heuristic that leverages clients' historical availability data by using a CNN-LSTM time series forecasting model, allowing the server to predict the optimal participation times of clients, thereby enabling informed selection decisions. By comparing against other FL techniques, we show that RIFLES provide significant improvement by between 10%-50% on a variety of metrics such as accuracy and test loss. To the best of our knowledge, it is the first work to investigate FL as a scheduling problem.
Abstract:Detecting toxic language including sexism, harassment and abusive behaviour, remains a critical challenge, particularly in its subtle and context-dependent forms. Existing approaches largely focus on isolated message-level classification, overlooking toxicity that emerges across conversational contexts. To promote and enable future research in this direction, we introduce SafeSpeech, a comprehensive platform for toxic content detection and analysis that bridges message-level and conversation-level insights. The platform integrates fine-tuned classifiers and large language models (LLMs) to enable multi-granularity detection, toxic-aware conversation summarization, and persona profiling. SafeSpeech also incorporates explainability mechanisms, such as perplexity gain analysis, to highlight the linguistic elements driving predictions. Evaluations on benchmark datasets, including EDOS, OffensEval, and HatEval, demonstrate the reproduction of state-of-the-art performance across multiple tasks, including fine-grained sexism detection.