Superintelligent Agents Pose Catastrophic Risks: Can Scientist AI Offer a Safer Path?

Abstract

The leading AI companies are increasingly focused on building generalist AIagents -- systems that can autonomously plan, act, and pursue goals acrossalmost all tasks that humans can perform. Despite how useful these systemsmight be, unchecked AI agency poses significant risks to public safety andsecurity, ranging from misuse by malicious actors to a potentially irreversibleloss of human control. We discuss how these risks arise from current AItraining methods. Indeed, various scenarios and experiments have demonstratedthe possibility of AI agents engaging in deception or pursuing goals that werenot specified by human operators and that conflict with human interests, suchas self-preservation. Following the precautionary principle, we see a strongneed for safer, yet still useful, alternatives to the current agency-driventrajectory. Accordingly, we propose as a core building block for furtheradvances the development of a non-agentic AI system that is trustworthy andsafe by design, which we call Scientist AI. This system is designed to explainthe world from observations, as opposed to taking actions in it to imitate orplease humans. It comprises a world model that generates theories to explaindata and a question-answering inference machine. Both components operate withan explicit notion of uncertainty to mitigate the risks of overconfidentpredictions. In light of these considerations, a Scientist AI could be used toassist human researchers in accelerating scientific progress, including in AIsafety. In particular, our system can be employed as a guardrail against AIagents that might be created despite the risks involved. Ultimately, focusingon non-agentic AI may enable the benefits of AI innovation while avoiding therisks associated with the current trajectory. We hope these arguments willmotivate researchers, developers, and policymakers to favor this safer path.

Quick Read (beta)

loading the full paper ...