The potential social harms that large language models pose, such as generating offensive content and reinforcing biases, are steeply rising. Existing works focus on coping with this concern while interacting with ill-intentioned users, such as those who explicitly make hate speech or elicit harmful responses. However, discussions on sensitive issues can become toxic even if the users are well-intentioned. For safer models in such scenarios, we present the Sensitive Questions and Acceptable Response (SQuARe) dataset, a large-scale Korean dataset of 49k sensitive questions with 42k acceptable and 46k non-acceptable responses. The dataset was constructed leveraging HyperCLOVA in a human-in-the-loop manner based on real news headlines. Experiments show that acceptable response generation significantly improves for HyperCLOVA and GPT-3, demonstrating the efficacy of this dataset.
AI-induced societal harms mirror existing problems in domains where AI replaces or complements traditional methodologies. However, trustworthy AI discourses postulate the homogeneity of AI, aim to derive common causes regarding the harms they generate, and demand uniform human interventions. Such AI monism has spurred legislation for omnibus AI laws requiring any high-risk AI systems to comply with a full, uniform package of rules on fairness, transparency, accountability, human oversight, accuracy, robustness, and security, as demonstrated by the EU AI Regulation and the U.S. draft Algorithmic Accountability Act. However, it is irrational to require high-risk or critical AIs to comply with all the safety, fairness, accountability, and privacy regulations when it is possible to separate AIs entailing safety risks, biases, infringements, and privacy problems. Legislators should gradually adapt existing regulations by categorizing AI systems according to the types of societal harms they induce. Accordingly, this paper proposes the following categorizations, subject to ongoing empirical reassessments. First, regarding intelligent agents, safety regulations must be adapted to address incremental accident risks arising from autonomous behavior. Second, regarding discriminative models, law must focus on the mitigation of allocative harms and the disclosure of marginal effects of immutable features. Third, for generative models, law should optimize developer liability for data mining and content generation, balancing potential social harms arising from infringing content and the negative impact of excessive filtering and identify cases where its non-human identity should be disclosed. Lastly, for cognitive models, data protection law should be adapted to effectively address privacy, surveillance, and security problems and facilitate governance built on public-private partnerships.