AI has the power to change the world in both wonderful and terrible ways. We should try to make the wonderful outcomes more likely than the terrible ones. Towards that end, here is a brain dump of my thoughts about how AI might go wrong, in rough outline form. I am not the first person to have any of these thoughts, but collecting and structuring these risks was useful for me. Hopefully reading them will be useful for you.

My top fears include targeted manipulation of humans, autonomous weapons, massive job loss, AI-enabled surveillance and subjugation, widespread failure of societal mechanisms, extreme concentration of power, and loss of human control.
  
I want to emphasize -- I expect AI to lead to far more good than harm, but part of achieving that is thinking carefully about risk.

# Warmup: Future AI capabilities and evaluating risk

1. Over the last several years, AI has developed remarkable new capabilities. These include [writing software](https://github.com/features/copilot), [writing essays](https://www.nytimes.com/2023/08/24/technology/how-schools-can-survive-and-maybe-even-thrive-with-ai-this-fall.html), [passing the bar exam](https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4389233), [generating realistic images](https://imagen.research.google/), [predicting how proteins will fold](https://www.deepmind.com/research/highlighted-research/alphafold), and [drawing unicorns in TikZ](https://arxiv.org/abs/2303.12712). (The last one is only slightly tongue in cheek. Controlling 2d images after being trained only on text is impressive.)

1. AI will continue to develop remarkable new capabilities.

  * Humans aren't irreplicable. There is no fundamental barrier to creating machines that can accomplish anything a group of humans can accomplish (excluding tasks that rely in their definition on being performed by a human).
  
    * For intellectual work, AI will become cheaper and faster than humans
  
    * For physical work, we are likely to see a sudden transition, from expensive robots that do narrow things in very specific situations, to cheap robots that can be repurposed to do many things.
  
      * The more capable and adaptable the software controlling a robot is, the cheaper, less reliable, and less well calibrated the sensors, actuators, and body need to be.

      * Scaling laws teach us that AI models can be improved by scaling up training data. I expect a virtuous cycle where somewhat general robots become capable enough to be widely deployed, enabling collection of much larger-scale diverse robotics data, leading to more capable robots.

  * The timeline for broadly human-level capabilities is hard to [predict](https://bounded-regret.ghost.io/scoring-ml-forecasts-for-2023/). My guess is more than 4 years and less than 40.

  * AI will do things that no human can do.

    * Operate faster than humans.

    * Repeat the same complex operation many times in a consistent and reliable way.

    * Tap into broader capabilities than any single human can tap into. e.g. the same model can [pass a medical exam](https://arxiv.org/abs/2303.13375), answer questions about [physics](https://benathi.github.io/blogs/2023-03/gpt4-physics-olympiad/) and [cosmology](https://www.linkedin.com/pulse/asking-gpt-4-cosmology-gabriel-altay/), [perform mathematical reasoning](https://blog.research.google/2022/06/minerva-solving-quantitative-reasoning.html?m=1), read [every human language](https://www.reddit.com/r/OpenAI/comments/13hvqfr/native_bilinguals_is_gpt4_equally_as_impressive/) ... and make unexpected connections between these fields.

    * Go deeper in a narrow area of expertise than a human could. e.g. an AI can read every email and calendar event you've ever received, web page you've looked at, and book you've read, and remind you of past context whenever anything -- person, topic, place -- comes up that's related to your past experience. Even the most dedicated personal human assistant would be unable to achieve the same degree of familiarity.

    * Share knowledge or capabilities directly, without going through a slow and costly teaching process. If an AI model gains a skill, that skill can be shared by copying the model's parameters. Humans are unable to gain new skills by copying patterns of neural connectivity from each other.

1. AI capabilities will have profound effects on the world.

  * Those effects have the possibility of being wonderful, terrible, or (most likely) some complicated mixture of the two.

  * There is not going to be just one consequence from advanced AI. AI will produce lots of different profound side effects, **all at once**. The fears below should not be considered as competing scenarios. You should rather imagine the chaos that will occur when variants of many of the below fears materialize simultaneously. (see the concept of [polycrisis](https://www.weforum.org/agenda/2023/03/polycrisis-adam-tooze-historian-explains/))

1. When deciding what AI risks to focus on, we should evaluate:

  * **probability:** How likely are the events that lead to this risk?

  * **severity:** If this risk occurs, how large is the resulting harm? (Different people will assign different severities based on different value systems. This is OK. I expect better outcomes if different groups focus on different types of risk.)

  * **cascading consequences:** Near-future AI risks could lead to the disruption of the social and institutional structures that enable us to take concerted rational action. If this risk occurs, how will it impact our ability to handle later AI risks?

  * **comparative advantage:** What skills or resources do I have that give me unusual leverage to understand or mitigate this particular risk?

1. We should take *social disruption* seriously as a negative outcome. This can be far worse than partisans having unhinged arguments in the media. If the mechanisms of society are truly disrupted, we should expect outcomes like violent crime, kidnapping, fascism, war, rampant addiction, and unreliable access to essentials like food, electricity, communication, and firefighters.

1. Mitigating most AI-related risks involves tackling a complex mess of overlapping social, commercial, economic, religious, political, geopolitical, and technical challenges. I come from an ML science + engineering background, and I am going to focus on suggesting mitigations in the areas where I have expertise. *We desperately need people with diverse interdisciplinary backgrounds working on non-technical mitigations for AI risk.*

# Specific risks and harms stemming from AI

1. The capabilities and limitations of present day AI are already causing or exacerbating harms.
  
  * Harms include: generating socially biased results; generating (or failing to recognize) toxic content; generating bullshit and lies (current large language models are poorly grounded in the truth even when used and created with the best intents); causing addiction and radicalization (through gamification and addictive recommender systems).

  * These AI behaviors are already damaging lives. e.g. see the use of racially biased ML to [recommend criminal sentencing](https://www.propublica.org/article/machine-bias-risk-assessments-in-criminal-sentencing)

  * I am not going to focus on this class of risk, despite its importance. These risks are already a topic of research and concern, though more resources are needed. I am going to focus on future risks, where less work is (mostly) being done towards mitigations.

1. AI will do most jobs that are currently done by humans.

  * This is likely to lead to massive unemployment.

  * This is likely to lead to massive social disruption.

  * I'm unsure in what order jobs will be supplanted. The tasks that are hard or easy for an AI are different than the tasks that are hard or easy for a person. We have terrible intuition for this difference.

    * Five years ago I would have guessed that generating commissioned art from a description would be one of the last, rather than one of the first, human tasks to be automated.

  * Most human jobs involve a diversity of skills. We should expect many jobs to [transform as parts of them are automated, before they disappear](https://www.journals.uchicago.edu/doi/full/10.1086/718327).

  * Most of the mitigations for job loss are social and political.

    * [Universal basic income](https://en.wikipedia.org/wiki/Universal_basic_income).

  * Technical mitigations:

    * Favor research and product directions that seem likely to be more complementary and enabling, and less competitive, with human job roles. Almost everything will have a little of both characters ... but the balance between enabling vs. competing with humans is a question we should be explicitly thinking about when we choose projects.

1. AI will enable extremely effective targeted manipulation of humans.

  * Twitter/X currently uses *primitive* machine learning models, and chooses a sequence of *pre-existing* posts to show me. This is enough to make me spend hours slowly scrolling a screen with my finger, receiving little value in return.

  * Future AI will be able to dynamically generate the text, audio, and video stimuli which is predicted to be most compelling to me personally, based upon the record of my past online interactions.

  * Stimuli may be designed to:

    * cause addictive behavior, such as compulsive app use

    * promote a political agenda

    * promote a religious agenda

    * promote a commercial agenda -- advertising superstimuli

  * Thought experiments

    * Have you ever met someone, and had an instant butterfly-in-the-stomach can't-quite-breathe feeling of attraction? Imagine if every time you load a website, there is someone who makes specifically you feel that way, telling you to drink coca-cola.
  
    * Have you ever found yourself obsessively playing an online game, or obsessively scrolling a social network or news source? Imagine if the intermittent rewards were generated based upon a model of your mental state, to be as addictive as possible to your specific brain at that specific moment in time.
  
    * Have you ever crafted an opinion to try to please your peers? Imagine that same dynamic, but where the peer feedback is artificial and chosen by an advertiser.
  
    * Have you ever listened to music, or looked at art, or read a passage of text, and felt like it was created just for you, and touched something deep in your identity? Imagine if every political ad made you feel that way.
  
  * I believe the social effects of this will be much, much more powerful and qualitatively different than current online manipulation. (*"[More is different](https://www.jstor.org/stable/pdf/1734697.pdf?casa_token=GDThS0md5IsAAAAA:cnx_fNDcb477G6-zU5qu0qC1tbKmgAhnIj_QecjFNwwYi3pge7vEWiaxIm4mAJqsatKbKnyMu-6ettZAtUDxysDPeFzAM736jpKJq-alTnjB4kCBAFrX3g)"*, or *"quantity has a quality all its own"*, depending on whether you prefer to quote P.W. Anderson or Stalin)

    * If our opinions and behavior are controlled by whomever pipes stimuli to us, then it breaks many of the basic mechanisms of democracy. Objective truth and grounding in reality will be increasingly irrelevant to societal decisions.

    * If the addictive potential of generated media is similar to or greater than that of hard drugs ... there are going to be a lot of addicts.

    * Class divides will grow worse, between people that are privileged enough to protect themselves from manipulative content, and those that are not.

    * Feelings of emotional connection or beauty may become vacuous, as they are mass produced. (see [parasocial relationships](https://en.wikipedia.org/wiki/Parasocial_interaction) for a less targeted present day example)

  * non-technical mitigations:

    * Advocate for laws that restrict stimuli and interaction dynamics which produce anomalous effects on human behavior.

    * Forbid apps on the Google or Apple storefront that produce anomalous effects on human behavior. (this will include forbidding extremely addictive apps -- so may be difficult to achieve given incentives)

  * Technical mitigations:

    * Develop tools to identify stimuli which will produce anomalous effects on human behavior, or anomalous affective response.

    * Protective filter: Develop models that rewrite stimuli (text or images or other modalities) to contain the same denoted information, but without the associated manipulative subtext. That is, rewrite stimuli to contain the parts you want to experience, but remove aspects which would make you behave in a strange way.

    * Study the ways in which human behavior and/or perception can be manipulated by optimizing stimuli, to better understand the problem.

      * I have done some work -- in a collaboration led by Gamaleldin Elsayed -- where we showed that adversarial attacks which cause image models to make incorrect predictions also bias the perception of human beings, even when the attacks are nearly imperceptible. See the Nature Communications paper [*Subtle adversarial image manipulations influence both human and machine perception*](https://www.nature.com/articles/s41467-023-40499-0).

      * Research scaling laws between model size, training compute, training data from an individual and from a population, and ability to influence a human.
  
1. AI will enable new weapons and new types of violence.

  * Autonomous weapons, i.e. weapons that can fight on their own, without requiring human controllers on the battlefield. 

    * Autonomous weapons are difficult to attribute to a responsible group. No one can prove whose drones committed an assassination or an invasion. We should expect increases in deniable anonymous violence.

    * Removal of social cost of war -- if you invade a country with robots, none of your citizens die, and none of them see atrocities. Domestic politics may become more accepting of war.

  * Development of new weapons

    * e.g. new biological, chemical, cyber, or robotic weapons

    * AI will enable these weapons to be made more capable + deadly than if they were created solely by humans.

    * AI may lower the barriers to access, so smaller + less resourced groups can make them.

  * Technical mitigations:

    * Be extremely cautious of doing research which is dual use. Think carefully about potential violent or harmful applications of a capability, during the research process.

    * When training and releasing models, include safeguards to prevent them being used for violent purposes. e.g. large language models should refuse to provide instructions for building weapons. Protein/DNA/chemical design models should refuse to design molecules which match characteristics of bio-weapons. This should be integrated as much as possible into the entire training process, rather than tacked on via fine-tuning.

1. AI will enable qualitatively new kinds of surveillance and social control.

  * AI will have the ability to simultaneously monitor all electronic communications (email, chat, web browsing, ...), cameras, and microphones in a society. It will be able to use that data to build a personalized model of the likely motivations, beliefs, and actions of every single person. Actionable intelligence on this scale, and with this degree of personalization, is different from anything previously possible.
  
  * This domestic surveillance data will be useful and extremely tempting even in societies which aren't currently authoritarian. e.g. detailed surveillance data could be used to prevent crime, stop domestic abuse, watch for the sale of illegal drugs, or track health crises.
  
  * Once a society starts using this class of technology, it will be difficult to seek political change. Organized movements will be transparent to whoever controls the surveillance technology. Behavior that is considered undesirable will be easily policed.
  
  * This class of data can be used for commercial as well as political ends. The products that are offered to you may become hyper-specialized. The jobs that are offered to you may become hyper-specific and narrowly scoped. This may have negative effects on social mobility, and on personal growth and exploration.
  
  * Political mitigations:
  
    * Offer jobs in the US to all the AI researchers in oppressive regimes!! We currently make it hard for world class talent from countries with which we have a bad relationship to immigrate. We should instead be making it easy for the talent to defect.
  
  * Technical mitigations:
  
    * Don't design the technologies that are obviously best suited for a panopticon.
  
    * Can we design behavioral patterns that are adversarial examples, and will mislead surveillance technology?
  
    * Can we use techniques e.g. from differential privacy to technically limit the types of information available in aggregated surveillance data?

1. AI will catalyze failure of societal mechanisms through increased efficiency. I wrote a [blog post on this class of risk](https://sohl-dickstein.github.io/2022/11/06/strong-Goodhart.html). 

* Many, many parts of our society rely on people and organizations pursuing proxy goals that are aligned with true goals that are good for society. * For instance, in American democracy presidential candidates pursue the proxy goal of getting the majority of electoral votes. Our democracy's healthy functioning relies on that proxy goal being aligned with an actual goal of putting people in power who act in the best interest of the populace. * When we get very efficient at pursuing a proxy goal, we *overfit* to the proxy goal, and this often makes the true goal grow *much worse*. * For instance, in American democracy we begin selecting narrowly for candidates that are best at achieving 270 electoral votes. Focusing on this leads to candidates lying, sabotaging beneficial policies of competitors, and degrading the mechanics of the electoral system. * AI is a tool that can make almost anything much more efficient. When it makes pursuit of a proxy goal more efficient, it will often make the true goal get worse. * AI is going to make pursuit of many, many proxy goals more efficient, *all at once*. We should expect all kinds of unexpected parts of society, which rely on inefficient pursuit of proxy goals, to break, *all at once*. * This is likely to lead to societal disruption, in unexpected ways. * Technical mitigations: * Study the mechanisms behind overfitting, and generalize our understanding of overfitting beyond optimization of machine learning models. * Find mitigations for overfitting that apply to social systems. (see [blog post](https://sohl-dickstein.github.io/2022/11/06/strong-Goodhart.html) again) 1. AI will lead to concentration of power. * AI will create massive wealth, and may provide almost unimaginable (god-like?) power to manipulate the world. * If the most advanced AI is controlled by a small group, then the personal quirks, selfish interests, and internal politics of that small group may have massive (existential?) impact on the rest of the world. * Examples of small groups include the leadership of OpenAI, Anthropic, Alphabet, or China. * This is likely to be a strongly negative outcome for everyone not in the controlling group. *"Power tends to corrupt and absolute power corrupts absolutely."* * Even if AI is available to a larger group, there may be dramatic disparities in access and control. These will lead to dramatic disparities in wealth and quality of life between AI haves and have-nots. * Technical mitigations: * Release AI models as open source. But this comes with its own set of misuse risks that need to be balanced against the benefits! I have no idea if this is a good idea in general. * Improve AI efficiency, both at inference and training, so that there aren't cost barriers to providing AI tools to the entire world. As in the last point though, AI that is too cheap to meter and widely distributed will increase many other AI risks. It's unclear what the right balance is. * As a researcher, try to work for the most responsible organizations. Try also to work for organizations that will diversify the set of *responsible* players, so that there isn't just one winner of the AI race. As with open source though, diversifying the set of organizations with cutting edge AI introduces its own risks! 1. AI will create a slippery slope, where humans lose control of our society. * AI will become better and more efficient at decision making than humans. We will outsource more and more critical tasks that are currently performed by humans. e.g.: * corporations run and staffed by AIs * government agencies run and staffed by AIs * AIs negotiating international trade agreements and regulation with other AIs * AIs identifying crimes, providing evidence of guilt, recommending sentencing * AIs identifying the most important problems to spend research and engineering effort on * AIs selecting the political candidates most likely to win elections, and advising those candidates on what to say and do * As a result, less and less decision making will be driven by human input. Humans will eventually end up as passive passengers in a global society driven by AIs. * It’s not clear whether this is a dystopia. In many ways, it could be good for humanity! But I like our agency in the world, and would find this an unfortunate outcome. * If society moves in a bad or weird direction, humans will find themselves disempowered to do anything about it. * Legal mitigations: * Require that humans be an active part of the decision making loop for a broad array of tasks. These are likely to feel like silly jobs though, and may also put the jurisdiction that requires them at an economic disadvantage. * Technical mitigations: * Value alignment! If AIs are going to be making all of our decisions for us, we want to make sure they are doing so in a way that aligns with our ethics and welfare. It will be important to make this alignment to societal values, rather than individual values. (take home assignment: write out a list of universally accepted societal values we should align our AI to.) * Augment humans. Find ways to make humans more effective or smarter, so that we remain relevant agents. 1. AI will cause disaster by superhuman pursuit of an objective that is misaligned with human values * This category involves an AI becoming far more intelligent than humans, and pursuing some goal that is misaligned with human intention ... leading to the superintelligent AI doing things like destroying the Earth or enslaving all humans as an [instrumental sub-goal](https://en.wikipedia.org/wiki/Instrumental_convergence) to achieve its misaligned goal. * This is a popular and actively researched AI risk in technical circles. I think its popularity is because it's the unique AI risk which seems solvable just by thinking hard about the problem and doing good research. All the other problems are at least as much social and political as technical. * I think the probability of this class of risk is low. But, the severity is potentialy high. It is worth thinking about and taking seriously. * I have a blog post arguing for a [hot mess theory of AI misalignment](https://sohl-dickstein.github.io/2023/03/09/coherence.html) -- as AIs become smarter, I believe they will become less coherent in their behavior (ie, more of a hot mess), rather than engage in monomanical pursuit of a slightly incorrect objective. That is, I believe we should be more worried about the kind of alignment failure where AIs simply behave in unpredictable ways that don't pursue any consistent objective. 1. AI will lead to unexpected harms. * The actual way in which the future plays out will be different from anyone's specific predictions. AI is a transformative and disruptive, but still *unpredictable*, technology. Many of the foundational capabilities and behaviors AI systems will exhibit are still unclear. It is also unclear how those capabilities and behaviors will interact with society. * Depending on the types of AI we build, and the ethics we choose, we may decide that AI has moral standing. If this happens, we will need to consider harm done to, as well as enabled by, AI. The types of harms an AI might experience are difficult to predict, since they will be unlike harms experienced by humans. (I don't believe near-future AI systems will have significant moral standing.) * Some of the greatest risks are likely to be things we haven't even thought of yet. We should prioritize identifying new risks. # Parting thoughts 1. If AI produces profound social effects, AI developers may be blamed. * This could lead to attacks on AI scientists and engineers, and other elites. This is especially likely if the current rule of law is one of the things disrupted by AI. (The Chinese cultural revolution and the Khmer Rouge regime are examples of cultural disruption that was not good for intellectual elites.) * It is in our own direct, as well as enlightened, self-interest to make the consequences of our technology as positive as possible. 1. Mitigating existential risks requires solving intermediate risks. * Many non-existential, intermediate time-scale, risks would damage our society's ability to act in the concerted thoughtful way required to solve later risks. * If you think existential risks like extinction or permanent dystopia are overriding, it is important to also work to solve earlier risks. If we don't solve the earlier risks, we won't achieve the level of cooperation required to solve the big ones. 1. It is important that we ground our risk assessments in experiment and theory. * Thinking carefully about the future is a valuable exercise, but is not enough on its own. Fields which are not grounded in experiments or formal validation [make silently incorrect conclusions](https://sohl-dickstein.github.io/2023/03/09/coherence.html#endnote-compneuro). * Right now, we are almost certainly making many silently incorrect conclusions about the shape of AI risk, because we base most of our AI risk scenarios on elaborate verbal arguments, without experimental validation. It is dangerous for us to be silently wrong about AI risks. * As we work to mitigate AI risk, we must try hard to validate the risks themselves. It is difficult -- but possible! -- to validate risks posed by technology that doesn't exist yet. We must work to find aspects of risk scenarios we can measure now or formally prove. 1. We have a lot of leverage, and we should use it to make the future we want. * AI will bend the arc of history, and we are early in the process of creating it. Small interventions at the beginning of something huge have enormous consequences. We can make small choices now that will make the future much better, or much worse. * AI has the potential to unlock astounding wealth, and do awesome (in the original sense of the word) good in the world. It can provide a personal tutor for every student, eliminate traffic accidents, solve cancer, solve aging, provide enough excess resources to easily feed the 700+ million people who live in hunger, make work an optional recreational activity, propel us to the planets and the stars, and more. * Building AI is also the most fascinating scientific endeavor of my lifetime. * We have a unique opportunity to build the future we want to live in. Thinking about how to avoid bad outcomes, and achieve good outcomes, is a necessary step in building it. # Acknowledgements Thank you to Asako Miyakawa, Meredith Ringel Morris, Noah Fiedel, Fernando Diaz, Rif, Sebastian Farquhar, Peter Liu, Dave Orr, Lauren Wilcox, Simon Kornblith, Gamaleldin Elsayed, and Toby Shevlane for valuable feedback on ideas in this post!