In recent years, discussions of responsible AI practices have seen growing support for "participatory AI" approaches, intended to involve members of the public in the design and development of AI systems. Prior research has identified a lack of standardised methods or approaches for how to use participatory approaches in the AI development process. At present, there is a dearth of evidence on attitudes to and approaches for participation in the sites driving major AI developments: commercial AI labs. Through 12 semi-structured interviews with industry practitioners and subject-matter experts, this paper explores how commercial AI labs understand participatory AI approaches and the obstacles they have faced implementing these practices in the development of AI systems and research. We find that while interviewees view participation as a normative project that helps achieve "societally beneficial" AI systems, practitioners face numerous barriers to embedding participatory approaches in their companies: participation is expensive and resource intensive, it is "atomised" within companies, there is concern about exploitation, there is no incentive to be transparent about its adoption, and it is complicated by a lack of clear context. These barriers result in a piecemeal approach to participation that confers no decision-making power to participants and has little ongoing impact for AI labs. This papers contribution is to provide novel empirical research on the implementation of public participation in commercial AI labs, and shed light on the current challenges of using participatory approaches in this context.
Generative AI systems across modalities, ranging from text, image, audio, and video, have broad social impacts, but there exists no official standard for means of evaluating those impacts and which impacts should be evaluated. We move toward a standard approach in evaluating a generative AI system for any modality, in two overarching categories: what is able to be evaluated in a base system that has no predetermined application and what is able to be evaluated in society. We describe specific social impact categories and how to approach and conduct evaluations in the base technical system, then in people and society. Our framework for a base system defines seven categories of social impact: bias, stereotypes, and representational harms; cultural values and sensitive content; disparate performance; privacy and data protection; financial costs; environmental costs; and data and content moderation labor costs. Suggested methods for evaluation apply to all modalities and analyses of the limitations of existing evaluations serve as a starting point for necessary investment in future evaluations. We offer five overarching categories for what is able to be evaluated in society, each with their own subcategories: trustworthiness and autonomy; inequality, marginalization, and violence; concentration of authority; labor and creativity; and ecosystem and environment. Each subcategory includes recommendations for mitigating harm. We are concurrently crafting an evaluation repository for the AI research community to contribute existing evaluations along the given categories. This version will be updated following a CRAFT session at ACM FAccT 2023.