Just Josh a wobsite about nerd things

Two Opposing Views on AGI Safety

Image credit: Image generated by a VQGAN + CLIP colab written by Max Woolf, using "chaos theory | statistical mechanics | Kardashev 3 civilization" as the input text.

An angle into fundamental debates on AGI safety, where I try to distill two opposing views, rarely articulated explicitly, that I see at the heart of many intense debates. View 1: the impact of AGI is like chaos theory. View 2: the impact of AGI is like stat mech.

View 1: The long-term future of the universe is highly sensitive to small perturbations in things we do now, in a tightly-focused window of time; everything in this window is of immense consequence, and everything after will unfold based on these perturbations now.

If we get it right, the future of all sapient life and joy and creativity is saved; if we get it wrong, all is lost, all is doomed, the light of consciousness extinguished; the universe a smoking tomb filled with replicating p-zombies soullessly converting matter to paperclips.

View 2: The long-term future of the universe is going to evolve like statistical mechanics, and all of the abrupt discontinuities kinda wash out in the large-scale ensemble of events. There will be increasingly-advanced AI / AGI gradually developed, there will be many of them.

The most catastrophic consequences are low-probability; the highs and lows are relatively well-bounded; if we can inspire a sufficiently large-scale trend among the scientific community to focus on safety, ethics, and alignment, things are most likely going to be fine.

Our best and most intense efforts now will have surprisingly little consequence for the long-term except to provide the statistical noise on the mean path for the far future of the universe, therefore prosaic, near-term concerns dominate.

Some reflections, as a joke take: View 1 people tend to think View 2 people are asleep at the wheel while the universe is about to end; View 2 people think View 1 people are literally insane and need a lot of therapy.

Some reflections more seriously: View 1 people often burn themselves out trying to save the world. The jury is actually out on whether or not they’re right, so I’d be lying if I said they were definitely burning themselves out for nothing.

I tend to take a position that’s something like 10% View 1 and 90% View 2. The long term seems really dominated by the behavior of ensembles and I doubt that any of our efforts have much bearing on the future challenges of eventual Kardashev 3 AI.

But it seems like we can get things really wrong and severely curtail human potential if we’re not immensely careful, and it’ll be very easy to make decisions about how to incorporate AI / AGI into the world that cause a lot of harm and/or lock us out of a lot of future upside.

To figure out the best way to get the field on the same page, though, we clearly need to get better at figuring out whether View 1 or View 2 is a better description of reality and allocate resources, time, and effort accordingly.