Abstract:Generative AI chatbots have proven surprisingly effective at persuading people to change their beliefs and attitudes in lab settings. However, the practical implications of these findings are not yet clear. In this work, we explore the impact of rehabilitative conversations with generative AI chatbots on users who share toxic content online. Toxic behaviors -- like insults or threats of violence, are widespread in online communities. Strategies to deal with toxic behavior are typically punitive, such as removing content or banning users. Rehabilitative approaches are rarely attempted, in part due to the emotional and psychological cost of engaging with aggressive users. In collaboration with seven large Reddit communities, we conducted a large-scale field experiment (N=893) to invite people who had recently posted toxic content to participate in conversations with AI chatbots. A qualitative analysis of the conversations shows that many participants engaged in good faith and even expressed remorse or a desire to change. However, we did not observe a significant change in toxic behavior in the following month compared to a control group. We discuss possible explanations for our findings, as well as theoretical and practical implications based on our results.
Abstract:Humans engage in lifelong social interactions through interacting with different people under different scenarios for different social goals. This requires social intelligence to gather information through a long time span and use it to navigate various social contexts effectively. Whether AI systems are also capable of this is understudied in the existing research. In this paper, we present a novel benchmark, LIFELONG-SOTOPIA, to perform a comprehensive evaluation of language agents by simulating multi-episode interactions. In each episode, the language agents role-play characters to achieve their respective social goals in randomly sampled social tasks. With LIFELONG-SOTOPIA, we find that goal achievement and believability of all of the language models that we test decline through the whole interaction. Although using an advanced memory method improves the agents' performance, the best agents still achieve a significantly lower goal completion rate than humans on scenarios requiring an explicit understanding of interaction history. These findings show that we can use LIFELONG-SOTOPIA to evaluate the social intelligence of language agents over lifelong social interactions.