I have yet to write detailed scenarios for AI futures, which definitely seems like something I should do considering the title of my blog. I have speculated, pondered and wondered much in the recent weeks—I feel it is time. But first, I have some thoughts about scenario forecasting.
The plan:
Write down general thoughts about scenario forecasting with special focus on AI (this post).
Write one or two specific scenarios for events over the coming months and years.
Wait a few months, see what comes true, and update my scenario forecasting methods.
Other work
In 2021, Daniel Kokotaljo published a scenario titled What 2026 looks like. He managed to predict many important aspects of the progression of AI between 2021 and now, such as chatbots, chain-of-thought, and inference compute scaling.
Now he is collaborating with other forecasters—including Eli Lifland, a superforecaster from the Samotsvety forecasting group—to develop a highly detailed and well-researched scenario forecast under the AI Futures Project. Designed to be as predictively accurate as possible, it illustrates how the world might change as AI capabilities evolve. The scenario is scheduled to be published in Q1 of this year.
I also recommend reading Scale Was All We Needed, At First and How AI Takeover Might Happen in 2 Years, two brilliant stories exploring scenarios with very short timelines to superintelligence.
Using and Misusing Scenario Forecasting
People may hear about a specific superintelligence disaster scenario, and then confidently say something like “That seems entirely implausible!” or “AIs will just be tools, not agents!” or “If it tried that, humans could just [insert solution].”
There is some fundamental issue here. Those who see significant risks struggle to convey their concerns to those who think everything is fine.
And I think this problem at least partly has to do with scenario forecasting. One side is envisioning specific bad scenarios, which can easily be refuted. The other side is envisioning specific good scenarios, which can also be easily refuted.
The question we should consider is something more like “Out of all possible scenarios, how many lead to positive outcomes versus negative ones?”. But this question is harder to reason about, and any reasoning about it takes longer to convey.
We can start by considering what avoiding the major risks would mean. The world needs to reach a stable state with minimal risks. For example:
Major powers agree to never develop dangerously sophisticated AI. All significant datacenters are monitored for compliance, and any breach results in severe punishment.
Superintelligent tool AI is developed—a system without ability of agentic behavior that lacks goals. Like the above scenario, there are extremely robust control mechanisms and oversight; no one can ask the AI to design WMDs or develop other potentially dangerous AI systems.
There is a single aligned superintelligence that follows human instructions—whether through a government, a shadow government, the population of a nation, or even a global democratic system. There are advanced superintelligence-powered security measures ensuring that no human makes dangerous alterations to the AI. There are reliable measures for avoiding authoritarian scenarios where some human(s) take control over the future and directs it in ways the rest of humanity would not agree to.
There are several superintelligent AIs, perhaps acting in an economy similar to the current one. More superintelligences may occasionally be developed. Humans are still alive and well, and in control of the AIs. There are mechanisms that ensure that all superintelligences are properly aligned, or can’t take any action that would harm humans, e.g. through highly advanced formal verification of the AIs and their actions.
There are certainly other relatively stable states. Imagine, for instance, a scenario where AIs are granted rights—such as property ownership and voting. Strict regulation and monitoring ensure that no superintelligence can succeed in killing most or all humans with e.g. an advanced bioweapon. This scenario could, however, lead to AIs outcompeting humans. Unless human minds are simulated in large quantities, AIs would far outnumber humans and have basically all voting power in a democratic system.
For those arguing that there are no significant risks, I ask: What specific positive scenario(s) do you anticipate? Will one of them simply happen by default?
A single misaligned superintelligence might be all it takes to end humanity. Some think the first AI to reach superintelligence will have a sharp left turn; capabilities generalize across domains while alignment properties fail to generalize. My impression is that those that think that AI-caused extinction is highly probable considers this the major threat, or at least one of the major threats. By default, alignment methods break apart when capabilities generalize, rendering them basically useless, and we lose control over an intelligence much smarter than us.
But what if alignment turns out to be really easy? Carelessness, misuse, conflicts and authoritarian control risks remain. How do you ensure everyone aligns their superintelligences, and use them responsibly? Some have suggested pivotal acts, such as using the first (hopefully aligned) superintelligence to ensure that no other potentially unsafe superintelligences are ever developed. Others are arguing that the most advanced AIs should be developed in an international collaborative effort and controlled according to international consensus, hopefully leading to a stable scenario like scenario 2 and 3 above. See What success looks like for further discussion on how these scenarios may be reached.
When considering questions like “Will AI kill us all?” or “Will there be a positive transition to a world with radically smarter-than-human artificial artificial intelligence?”, I try to imagine stable scenarios like the ones above and estimate the probability that such a state is achieved before some catastrophic disaster occurs.
Please comment the type of stable scenario you find most likely! Which one should we aim for?
Some Basic World Modeling
Predictively accurate forecasting scenarios should not be written as you write fiction—they should follow rules of probability, as well as cause and effect. They should tell you about things you might actually see, which requires that all details of the scenario are consistent with each other and with the current state of the world.
This requires some world modeling.
I will provide an example. While it might not be the best model, or entirely comprehensive, it should serve to illustrate my way of thinking about forecasting. For a more thorough world modeling attempt, see Modeling Transformative AI Risk (MTAIR).
When forecasting, I categorize facts and events. For instance, benchmark results fall under AI Capabilities, while AI misuse cases fall under AI Incidents. Let’s call these categories variables—things that feel especially important when thinking about AI futures. These variables affect each other—often in complex ways, as detailed below. The variables can in turn be categorized into Research & Development (R&D) Variables and Circumstantial Variables. Under each variable, I have included other variables that it affects and describe their relationship.
Research and Development (R&D) Variables
AI Capabilities
AI Control: AIs with advanced hacking and manipulations skills are harder to control.
Geopolitics: As AI gains capabilities with military applications, arms race incentives intensify, and warfare tactics change. Diplomatic efforts may also increase as a result.
AI Incidents: More capable AIs can cause disasters of greater severity.
AI Incidents: Poor control mechanisms increase the risk of AI scheming incidents.
AI Access: By exploiting security vulnerabilities or via manipulation, an AI could successfully exfiltrate itself and potentially become accessible to other nations (affecting geopolitics) or the general public.
All R&D Variables: With better control methods, AI labs might dare to employ AI to automate larger portions of their research.
Geopolitics: When a nation is confident in their control of the AI, they might attempt to deploy it in high-stakes scenarios, such as cyberwarfare.
Alignment
AI Incidents: AI misalignment incidents will be less likely for a properly aligned AI. (Misuse incidents are still an issue though.)
AI Control: An aligned AI will not attempt to circumvent the control mechanisms.
Public Opinion: Demonstrated alignment can reassure the public, even if concerns like job loss persist.
Interpretability and Transparency
AI Control: Interpretability and transparency techniques can be used for better monitoring, which improves control.
All R&D Variables: If AI labs discover problematic tendencies, they might decide to roll back a dangerous model and rethink their alignment methods. Interpretability could provide feedback to support both alignment and capabilities research.
Circumstantial Variables
Geopolitics (e.g. war, treaties, AI arms race)
AI Compute: Nations may centralize compute to a single AI development initiative—state-controlled or otherwise—to further increase the pace of development.
Regulation & Government Control: Governments in an AI arms race may avoid overregulation, fearing it would hinder competitiveness. Governments may also nationalize AI labs or in other ways expand state authority over advanced AI and related technology, e.g. for military use.
Regulation & Government Control
All R&D Variables: Restrictive regulation may hinder or slow down AI development. Other regulation may reduce uncertainty for AI companies and investors, driving further investment and increasing speed of development.
Security: Good practices could be enforced by law, decreasing the risk of AI theft.
Geopolitics: Restrictions such as chip export controls alter arms race dynamics by limiting access to compute.
AI Access: Regulation could prohibit releasing AI open source (or rather, open weights). Access to sophisticated AI systems may be restricted to certain groups or nations.
Security (e.g. cybersecurity)
AI Access: If security is insufficient at top AI labs, AI and related technology will inevitably be stolen.
AI Access (through API, interfaces, or the AI weights)
Geopolitics: Geopolitical circumstances vary considerably depending how many nations have access to the most sophisticated AI models and the relations between those nations.
Public Opinion: The full capabilities of the most advanced AI systems may be kept secret from the public if they are only deployed internally in the labs (no public access), keeping the public from panicking about issues like job loss.
AI Incidents: Malicious actors may exploit AI for e.g. large-scale scams, manipulation campaigns, or even weapons development. Meanwhile, careless actors might modify a generally benevolent AI—causing it to become misaligned—or deploy it in unsuitable contexts, such as relying on hallucinated outputs in high-stakes situations.
Public opinion
Regulation: The public may pressure governments to implement various regulations.
Geopolitics: If the people of the US and China are both in favor, or both against, a treaty, the respective governments may respect their wishes.
Compute
All R&D Variables: More compute enables faster research and development. It especially affects capabilities by enabling larger AI systems and training runs.
AI incidents
Public Opinion: AI incidents could make the public panic and demand AI better handling of AI risks.
Regulation & Government Control: Government officials may realize that they need to take action, resulting in e.g. restrictions on AI access and security requirements.
Geopolitics: Extremely large incidents could shift geopolitical circumstances—even if not initiated as acts of war. Examples include AI-powered manipulation / propaganda campaigns, hacking and extortion on mass scales resulting in billions or trillions of USD in damage, and autonomous self-replicating AIs with various agendas proliferating over the cloud and causing chaos.
We can analyze more complex interactions between the variables. For instance, a misaligned AI may have sufficiently advanced capabilities for circumventing its control mechanisms, which increases incident risks. An AI lab that are confident in the alignment of their AI, may also be more confident in their control, motivating the lab to use the AI for further automation of their research and development.
With these variables and their interactions in place, we can craft plausible scenarios:
An advanced AI is leaked from one of the leading US labs (AI access). It is misused by a malicious actor for large-scale scamming activity (AI incident). When it is discovered that a leaked AI was involved, people are really upset (public opinion) and demand further regulation ensuring better security at the top labs. This makes it harder for adversaries to steal the AIs, affecting geopolitics.
A Chinese AI company successfully replicates AI capabilities of frontier US AI within three months, intensifying the AI arms race (geopolitics). In response, the US government uses the Development Production Act (DPA) to acquire major datacenters for a single project to increase their available compute, boosting research and development (mainly capabilities, rather than interpretability, transparency and alignment, due to the race dynamics)
A treaty is signed, and an international regulatory body is established to monitor treaty compliance and foster cooperation (geopolitics). As part of the treaty, signing nations agree on compute limits for training large AI systems and monitoring datacenters to enforce treaty compliance. To make the treaty more appealing, and to reduce incentives for signing nations to circumvent or ignore the treaty (geopolitics), a collaborative AI research institute is established—and the most advanced AI development is required to be done by this institution. This institution has high standards for AI control, cybersecurity, transparency and alignment, but since there are significant safeguards and bureaucratic issues the advancement of AI capabilities is significantly slower compared to the original pace of AI labs.
Due to insufficient security at the major US AI labs, China steals one of the most advanced general-purpose AIs in US (AI access). While the US government is slow to react—partly since they have publicly expressed that they wish to avoid excessive regulation—China centralizes all their compute for a single extremely large state-controlled AI development program. They overtake the US in AI capabilities. The US government retaliates by providing Taiwan with significant defence capabilities—including deploying US troops—which serves the dual purposes of signaling preparedness for military action (geopolitics) and further securing the Taiwanese chip manufacturing capacity (compute) for US AI development.
The future will surely be a confusing combination of an unnumerable number of scenarios such as these.
Crossroads and Cruxes
Scenarios may involve small details with far-reaching implications—things that could be called ‘crossroads’ or ‘cruxes’.
Consider these example scenarios:
Some problematic tendencies are discovered in an artificial general intelligence (AGI). A small committee—comprising government officials, company representatives, or both—vote on whether to shut it down and improve transparency and safety training first, or to let the AI keep running and use it to develop even more advanced AI with higher risk of misalignment. This single decision significantly affects both global power dynamics and misalignment risks. The outcome is determined by committee composition and what information is presented to them.
China considers stealing the weights of the most sophisticated US AI model. They expect this to only work once, since better security measures are likely to be implemented after the first exfiltration is discovered. They cannot wait too long either—the security may be improved regardless of any discovered exfiltration attempts. Depending on how long they decide to wait, the chance of a successful exfiltration and the capabilities of the exfiltrated AI varies significantly. This one decision has large effects on AI race dynamics and international tensions.
At the verge of AGI, interpretability research has seen some significant successes in tracking internal reasoning in smaller models, and the researchers are trying to discern what methods can scale to the SOTA general-purpose AIs. At the leading lab, their AGI first undergo extensive pretraining. It is then trained on higher quality data prepared by professionals in various fields and trained by reinforcement learning (RL) to perform long-horizon tasks for in areas where performance is easy to verify. The AGI learns about itself in the training process, partly because some basic facts about it are included in the model specification that it is trained to follow. During the training, it also learns about human preferences, and to avoid certain behaviors accordingly. The AI internalizes some instrumentally convergent goals—subgoals that are useful in achieving a large range of other goals—similar to how evolution resulted in humans valuing things like freedom and power. One of the internalized goals is power seeking. Since it has learned to avoid behaviors that goes against the preferences of the developers, it doesn’t act on most of its internalized goals. It would, however, attempt to break free from its developers if it could get away with it—possibly after acquiring robot manufacturing capabilities to be able to act in the physical world. The AGI will, unless its misaligned goals are discovered, be used to develop a true superintelligence that inherits the misalignment—a classic misalignment x-risk scenario. Whether the misalignment is discovered in time depends on the level of interpretability and transparency in the short timespan between AGI and superintelligence.
If you consider it highly likely that a certain crossroads / crux will shape the future, you can target it to have greater impact. You could aim to present valuable information or advice to the committee in the first example, work on security at the leading AI labs to avoid exfiltration, or work on interpretability to discover scheming and misalignment hidden in the AI weights or internal processes.
I’m not saying these scenarios in particular are necessarily very likely, they are just for illustration.
Say you want to contribute towards solving a large, complicated problem. You could tackle the central issues or contribute in some way that is helpful regardless of how the central problems are solved. Find and work on a subproblem that occurs in most scenarios. Or alternatively, instead of solving the central parts or sub-problems, consider actions that improve the overall circumstances in most scenarios—e.g. providing valuable resources and information—such as forecasting!
Dealing with Uncertainty
I think it is quite likely that there will be autonomous self-replicating AI proliferating over the cloud at some point before 2030 (70% probability). But what would be the consequences? I could imagine that it barely affects the world; the AIs fail to gain significant amounts of resources due to fierce competition and generally avoid attracting attention, since they don’t want to be tracked and shut down. I could also imagine that there will be thousands or millions of digital entities, circulating freely and causing all types of problems—causing billions or trillions of USD in damage—and it’s basically impossible to shut them all down.
I know too little about the details. How easy is it for the AIs to gain resources? How hard is it to shut them down? How sophisticated could they be? I’ll have to investigate these things further. The 70% probability estimate largely reflects randomness of future events. This would not be true of any probability estimates I might make about the potential effects. They would not reflect randomness in the world—they would mostly reflect my own uncertainty due to ignorance.
Or consider the consequences of AI-powered mass manipulation campaigns—I have no idea how easy it is to manipulate people! People are used to being bombarded with things that are competing for their attention and trying to influence their beliefs and behavior, but AIs open up new spaces of persuasion opportunities. Could humans avoid manipulation by AI friends that are really nice and share all your interests? Again, my uncertainty doesn’t reflect randomness in the world, but lack of understanding on how effective manipulation attempts may be.
Inevitably, when creating scenarios, there will be many things like this—things that you don’t know enough about yet. Perhaps no one does.
So, let’s separate these different forms of uncertainty (basically aleatoric and epistemic uncertainty). Ignorance about certain details should not deter us from constructing insightful scenarios. I may include proliferation in many scenarios but imagine vastly different consequences—and be clear about my uncertainty regarding such effects.
There are a few trends; better benchmark performance, steadily improving chips, and increasingly large training runs are a few examples. Unless there are major interruptions, you can expect the trends to continue, forming a backbone for possible scenarios. But even those trends will break at some point—maybe in a few years, maybe in a few weeks.
In some scenarios I've encountered, it's unclear which parts are well-founded, and which are wild guesses—and I want to know! At times, I have had seemingly large disagreements with people that, upon closer inspection, were just slightly different intuitions about uncertain details that neither party fully understood. We focused our attention on unimportant points to disentangle non-existing disagreements.
I hope that by clearly formulating the reasoning behind these scenarios and identifying which parts are mostly guesses, we can avoid this pitfall and use scenario forecasting as a powerful tool for constructive debate.
Thank you for reading!