Bob
Abstract:Personalized large language models adapt responses to users' preferences and social attributes, but can introduce substantial universal truth inconsistencies across social groups, where some groups systematically receive less accurate responses on objective tasks. Existing alignment methods either ignore personalization or mainly focus on subjective preference alignment, largely overlooking fairness and consistency in universal truths. To address this gap, we study Truth-Invariant Alignment (TIA), an alignment problem for personalized LLMs that aims to ensure universal truths remain consistent across social groups while preserving personalization. We propose TriAlign, the first offline multi-agent reinforcement learning (MARL) framework for TIA, where each social group is modeled as an agent interacting. TriAlign jointly optimizes universal truth accuracy, cross-group truth consistency, and personalization through a fairness-aware objective and an explicit inconsistency penalty. Experiments across diverse benchmarks demonstrate that TriAlign achieves a stronger balance among these three objectives than strong baselines, reducing universal truth disparities across social groups while improving both objective task performance and personalization quality.



Abstract:In this paper we explore the challenges and strategies for enhancing the robustness of $k$-means clustering algorithms against adversarial manipulations. We evaluate the vulnerability of clustering algorithms to adversarial attacks, emphasising the associated security risks. Our study investigates the impact of incremental attack strength on training, introduces the concept of transferability between supervised and unsupervised models, and highlights the sensitivity of unsupervised models to sample distributions. We additionally introduce and evaluate an adversarial training method that improves testing performance in adversarial scenarios, and we highlight the importance of various parameters in the proposed training method, such as continuous learning, centroid initialisation, and adversarial step-count.




Abstract:Raven's Progressive Matrices have been widely used for measuring abstract reasoning and intelligence in humans. However for artificial learning systems, abstract reasoning remains a challenging problem. In this paper we investigate how neural networks augmented with biologically inspired spiking modules gain a significant advantage in solving this problem. To illustrate this, we first investigate the performance of our networks with supervised learning, then with unsupervised learning. Experiments on the RAVEN dataset show that the overall accuracy of our supervised networks surpass human-level performance, while our unsupervised networks significantly outperform existing unsupervised methods. Finally, our results from both supervised and unsupervised learning illustrate that, unlike their non-augmented counterparts, networks with spiking modules are able to extract and encode temporal features without any explicit instruction, do not heavily rely on training data, and generalise more readily to new problems. In summary, the results reported here indicate that artificial neural networks with spiking modules are well suited to solving abstract reasoning.