Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Sangchul Park

Fair Clustering via Alignment

May 14, 2025

Kunwoong Kim, Jihu Lee, Sangchul Park, Yongdai Kim

Abstract:Algorithmic fairness in clustering aims to balance the proportions of instances assigned to each cluster with respect to a given sensitive attribute. While recently developed fair clustering algorithms optimize clustering objectives under specific fairness constraints, their inherent complexity or approximation often results in suboptimal clustering utility or numerical instability in practice. To resolve these limitations, we propose a new fair clustering algorithm based on a novel decomposition of the fair K-means clustering objective function. The proposed algorithm, called Fair Clustering via Alignment (FCA), operates by alternately (i) finding a joint probability distribution to align the data from different protected groups, and (ii) optimizing cluster centers in the aligned space. A key advantage of FCA is that it theoretically guarantees approximately optimal clustering utility for any given fairness level without complex constraints, thereby enabling high-utility fair clustering in practice. Experiments show that FCA outperforms existing methods by (i) attaining a superior trade-off between fairness level and clustering utility, and (ii) achieving near-perfect fairness without numerical instability.

* Accepted at ICML 2025. This is the version submitted for review and will be replaced by the camera-ready version soon

Via

Access Paper or Ask Questions

Fairness Through Matching

Jan 06, 2025

Kunwoong Kim, Insung Kong, Jongjin Lee, Minwoo Chae, Sangchul Park, Yongdai Kim

Abstract:Group fairness requires that different protected groups, characterized by a given sensitive attribute, receive equal outcomes overall. Typically, the level of group fairness is measured by the statistical gap between predictions from different protected groups. In this study, we reveal an implicit property of existing group fairness measures, which provides an insight into how the group-fair models behave. Then, we develop a new group-fair constraint based on this implicit property to learn group-fair models. To do so, we first introduce a notable theoretical observation: every group-fair model has an implicitly corresponding transport map between the input spaces of each protected group. Based on this observation, we introduce a new group fairness measure termed Matched Demographic Parity (MDP), which quantifies the averaged gap between predictions of two individuals (from different protected groups) matched by a given transport map. Then, we prove that any transport map can be used in MDP to learn group-fair models, and develop a novel algorithm called Fairness Through Matching (FTM), which learns a group-fair model using MDP constraint with an user-specified transport map. We specifically propose two favorable types of transport maps for MDP, based on the optimal transport theory, and discuss their advantages. Experiments reveal that FTM successfully trains group-fair models with certain desirable properties by choosing the transport map accordingly.

* Published in TMLR

Via

Access Paper or Ask Questions

SQuARe: A Large-Scale Dataset of Sensitive Questions and Acceptable Responses Created Through Human-Machine Collaboration

May 28, 2023

Hwaran Lee, Seokhee Hong, Joonsuk Park, Takyoung Kim, Meeyoung Cha, Yejin Choi, Byoung Pil Kim, Gunhee Kim, Eun-Ju Lee, Yong Lim(+3 more)

Figure 1 for SQuARe: A Large-Scale Dataset of Sensitive Questions and Acceptable Responses Created Through Human-Machine Collaboration

Figure 2 for SQuARe: A Large-Scale Dataset of Sensitive Questions and Acceptable Responses Created Through Human-Machine Collaboration

Figure 3 for SQuARe: A Large-Scale Dataset of Sensitive Questions and Acceptable Responses Created Through Human-Machine Collaboration

Figure 4 for SQuARe: A Large-Scale Dataset of Sensitive Questions and Acceptable Responses Created Through Human-Machine Collaboration

Abstract:The potential social harms that large language models pose, such as generating offensive content and reinforcing biases, are steeply rising. Existing works focus on coping with this concern while interacting with ill-intentioned users, such as those who explicitly make hate speech or elicit harmful responses. However, discussions on sensitive issues can become toxic even if the users are well-intentioned. For safer models in such scenarios, we present the Sensitive Questions and Acceptable Response (SQuARe) dataset, a large-scale Korean dataset of 49k sensitive questions with 42k acceptable and 46k non-acceptable responses. The dataset was constructed leveraging HyperCLOVA in a human-in-the-loop manner based on real news headlines. Experiments show that acceptable response generation significantly improves for HyperCLOVA and GPT-3, demonstrating the efficacy of this dataset.

* 19 pages, 10 figures, ACL 2023

Via

Access Paper or Ask Questions

Heterogeneity of AI-Induced Societal Harms and the Failure of Omnibus AI Laws

Apr 02, 2023

Sangchul Park

Abstract:AI-induced societal harms mirror existing problems in domains where AI replaces or complements traditional methodologies. However, trustworthy AI discourses postulate the homogeneity of AI, aim to derive common causes regarding the harms they generate, and demand uniform human interventions. Such AI monism has spurred legislation for omnibus AI laws requiring any high-risk AI systems to comply with a full, uniform package of rules on fairness, transparency, accountability, human oversight, accuracy, robustness, and security, as demonstrated by the EU AI Regulation and the U.S. draft Algorithmic Accountability Act. However, it is irrational to require high-risk or critical AIs to comply with all the safety, fairness, accountability, and privacy regulations when it is possible to separate AIs entailing safety risks, biases, infringements, and privacy problems. Legislators should gradually adapt existing regulations by categorizing AI systems according to the types of societal harms they induce. Accordingly, this paper proposes the following categorizations, subject to ongoing empirical reassessments. First, regarding intelligent agents, safety regulations must be adapted to address incremental accident risks arising from autonomous behavior. Second, regarding discriminative models, law must focus on the mitigation of allocative harms and the disclosure of marginal effects of immutable features. Third, for generative models, law should optimize developer liability for data mining and content generation, balancing potential social harms arising from infringing content and the negative impact of excessive filtering and identify cases where its non-human identity should be disclosed. Lastly, for cognitive models, data protection law should be adapted to effectively address privacy, surveillance, and security problems and facilitate governance built on public-private partnerships.

Via

Access Paper or Ask Questions