Abstract:Selecting an appropriate divergence measure is a critical aspect of machine learning, as it directly impacts model performance. Among the most widely used, we find the Kullback-Leibler (KL) divergence, originally introduced in kinetic theory as a measure of relative entropy between probability distributions. Just as in machine learning, the ability to quantify the proximity of probability distributions plays a central role in kinetic theory. In this paper, we present a comparative review of divergence measures rooted in kinetic theory, highlighting their theoretical foundations and exploring their potential applications in machine learning and artificial intelligence.
Abstract:We propose Group Shapley, a metric that extends the classical individual-level Shapley value framework to evaluate the importance of feature groups, addressing the structured nature of predictors commonly found in business and economic data. More importantly, we develop a significance testing procedure based on a three-cumulant chi-square approximation and establish the asymptotic properties of the test statistics for Group Shapley values. Our approach can effectively handle challenging scenarios, including sparse or skewed distributions and small sample sizes, outperforming alternative tests such as the Wald test. Simulations confirm that the proposed test maintains robust empirical size and demonstrates enhanced power under diverse conditions. To illustrate the method's practical relevance in advancing Explainable AI, we apply our framework to bond recovery rate predictions using a global dataset (1996-2023) comprising 2,094 observations and 98 features, grouped into 16 subgroups and five broader categories: bond characteristics, firm fundamentals, industry-specific factors, market-related variables, and macroeconomic indicators. Our results identify the market-related variables group as the most influential. Furthermore, Lorenz curves and Gini indices reveal that Group Shapley assigns feature importance more equitably compared to individual Shapley values.
Abstract:This paper introduces a collaborative, human-centered taxonomy of AI, algorithmic and automation harms. We argue that existing taxonomies, while valuable, can be narrow, unclear, typically cater to practitioners and government, and often overlook the needs of the wider public. Drawing on existing taxonomies and a large repository of documented incidents, we propose a taxonomy that is clear and understandable to a broad set of audiences, as well as being flexible, extensible, and interoperable. Through iterative refinement with topic experts and crowdsourced annotation testing, we propose a taxonomy that can serve as a powerful tool for civil society organisations, educators, policymakers, product teams and the general public. By fostering a greater understanding of the real-world harms of AI and related technologies, we aim to increase understanding, empower NGOs and individuals to identify and report violations, inform policy discussions, and encourage responsible technology development and deployment.