Abstract:Since the release of ChatGPT, there has been a lot of debate about whether AI systems pose an existential risk to humanity. This paper develops a general framework for thinking about the existential risk of AI systems. We analyze a two premise argument that AI systems pose a threat to humanity. Premise one: AI systems will become extremely powerful. Premise two: if AI systems become extremely powerful, they will destroy humanity. We use these two premises to construct a taxonomy of survival stories, in which humanity survives into the far future. In each survival story, one of the two premises fails. Either scientific barriers prevent AI systems from becoming extremely powerful; or humanity bans research into AI systems, thereby preventing them from becoming extremely powerful; or extremely powerful AI systems do not destroy humanity, because their goals prevent them from doing so; or extremely powerful AI systems do not destroy humanity, because we can reliably detect and disable systems that have the goal of doing so. We argue that different survival stories face different challenges. We also argue that different survival stories motivate different responses to the threats from AI. Finally, we use our taxonomy to produce rough estimates of P(doom), the probability that humanity will be destroyed by AI.
Abstract:This work defends the 'Whole Hog Thesis': sophisticated Large Language Models (LLMs) like ChatGPT are full-blown linguistic and cognitive agents, possessing understanding, beliefs, desires, knowledge, and intentions. We argue against prevailing methodologies in AI philosophy, rejecting starting points based on low-level computational details ('Just an X' fallacy) or pre-existing theories of mind. Instead, we advocate starting with simple, high-level observations of LLM behavior (e.g., answering questions, making suggestions) -- defending this data against charges of metaphor, loose talk, or pretense. From these observations, we employ 'Holistic Network Assumptions' -- plausible connections between mental capacities (e.g., answering implies knowledge, knowledge implies belief, action implies intention) -- to argue for the full suite of cognitive states. We systematically rebut objections based on LLM failures (hallucinations, planning/reasoning errors), arguing these don't preclude agency, often mirroring human fallibility. We address numerous 'Games of Lacks', arguing that LLMs do not lack purported necessary conditions for cognition (e.g., semantic grounding, embodiment, justification, intrinsic intentionality) or that these conditions are not truly necessary, often relying on anti-discriminatory arguments comparing LLMs to diverse human capacities. Our approach is evidential, not functionalist, and deliberately excludes consciousness. We conclude by speculating on the possibility of LLMs possessing 'alien' contents beyond human conceptual schemes.
Abstract:Can humans and artificial intelligences share concepts and communicate? 'Making AI Intelligible' shows that philosophical work on the metaphysics of meaning can help answer these questions. Herman Cappelen and Josh Dever use the externalist tradition in philosophy to create models of how AIs and humans can understand each other. In doing so, they illustrate ways in which that philosophical tradition can be improved. The questions addressed in the book are not only theoretically interesting, but the answers have pressing practical implications. Many important decisions about human life are now influenced by AI. In giving that power to AI, we presuppose that AIs can track features of the world that we care about (for example, creditworthiness, recidivism, cancer, and combatants). If AIs can share our concepts, that will go some way towards justifying this reliance on AI. This ground-breaking study offers insight into how to take some first steps towards achieving Interpretable AI.
Abstract:This paper presents an argument that certain AI safety measures, rather than mitigating existential risk, may instead exacerbate it. Under certain key assumptions - the inevitability of AI failure, the expected correlation between an AI system's power at the point of failure and the severity of the resulting harm, and the tendency of safety measures to enable AI systems to become more powerful before failing - safety efforts have negative expected utility. The paper examines three response strategies: Optimism, Mitigation, and Holism. Each faces challenges stemming from intrinsic features of the AI safety landscape that we term Bottlenecking, the Perfection Barrier, and Equilibrium Fluctuation. The surprising robustness of the argument forces a re-examination of core assumptions around AI safety and points to several avenues for further research.
Abstract:AlphaGo plays chess and Go in a creative and novel way. It is natural for us to attribute contents to it, such as that it doesn't view being several pawns behind, if it has more board space, as bad. The framework introduced in Cappelen and Dever (2021) provides a way of thinking about the semantics and the metasemantics of AI content: does AlphaGo entertain contents like this, and if so, in virtue of what does a given state of the program mean that particular content? One salient question Cappelen and Dever didn't consider was the possibility of alien content. Alien content is content that is not or cannot be expressed by human beings. It's highly plausible that AlphaGo, or any other sophisticated AI system, expresses alien contents. That this is so, moreover, is plausibly a metasemantic fact: a fact that has to do with how AI comes to entertain content in the first place, one that will heed the vastly different etiology of AI and human content. This chapter explores the question of alien content in AI from a semantic and metasemantic perspective. It lays out the logical space of possible responses to the semantic and metasemantic questions alien content poses, considers whether and how we humans could communicate with entities who express alien content, and points out that getting clear about such questions might be important for more 'applied' issues in the philosophy of AI, such as existential risk and XAI.