NewsBite

We have three years until AI could wipe us out. I’m losing sleep. Why aren’t the tech bros?

Scientists know the dangers of superintelligence – but, like the Manhattan Project physicists, continue to develop tech they fear could spell the end to human civilisation.

AI wiping out the human race is something Silicon Valley types and artificial intelligence worshippers discuss regularly behind closed doors.
AI wiping out the human race is something Silicon Valley types and artificial intelligence worshippers discuss regularly behind closed doors.

I’m having a lot of trouble sleeping because I’m genuinely concerned that the world, or at least human existence, is going to end, possibly in the next three years. I’m also struggling with the wilful stupidity of the scientists and tech bros who are knowingly arming this Armageddon, and what I want to know is not so much how they sleep but how and why they get up and go to work every day.

P (doom) has got to be the unfunniest running joke since American politics, but apparently it’s something Silicon Valley types and artificial intelligence worshippers all discuss regularly behind closed computer lab doors. Pretty much everyone who knows what one is has their own P (doom) number (honestly, even the way it’s written is nerdingly annoying), which is basically the percentage chance they ascribe to AI wiping out the human race.

Figures differ, but the average P (doom) score for AI engineers ranges between 15 per cent and 40 per cent. Geoff Hinton, known as “the Godfather of AI”, wavers between 20 per cent and 50 per cent, Elon Musk is somewhere from 10 per cent to 20 per cent and recent OpenAI employee Daniel Kokotajlo, who we’ll get to, is at 70 per cent.

(This may be a good moment to measure your own P (doom), then come back and ponder it again at the end of this article.)

J. Robert Oppenheimer was the director of the Manhattan Project’s Los Alamos Laboratory. Picture: Keystone. Alone Historical
J. Robert Oppenheimer was the director of the Manhattan Project’s Los Alamos Laboratory. Picture: Keystone. Alone Historical

What is truly remarkable is how anyone who knows or fears the work they are doing is so dangerous keeps on doing it. History provides one pointed parallel in the Manhattan Project where, during World War II, a group of scientists were told they had to race to produce a weapon of unimaginable mass destruction before the Germans did so.

By the end of 1944 it had become clear to those toiling to create a new big bang in New Mexico that Germany was not building one of its own, and at this point all of them surely must have pondered the point of continuing down such a frightening path.

Yet only one, Joseph Rotblat, a Polish physicist in the British delegation, walked away, saying “the whole purpose of my being in Los Alamos ceased to be” once he realised they weren’t really in a race with the Nazis.

Scientists, and perhaps humans in general, suffer from the kind of curiosity that is so lethal for felines: a need to know what’s going to happen if they keep going. Personally, if I were shown evidence that writing one of my dangerously amusing car reviews could cause a lot of people to die from laughing, I would stop writing it. Much as I would definitely quit my job if I were working in a lab and realised I was handling bat viruses that could jump into humans and kill millions of them.

The invention of the atom bomb led, of course, to the existential angst that defined my childhood in the 1970s and 80s, and the MAD era of mutually assured destruction. As well as a little incident on September 26, 1983, where a malfunctioning computer told the Russians that the US had fired nuclear missiles at them. Fortunately a human, with no help from AI, made the call to ignore the malfunction and not launch a catastrophic retaliatory strike.

Despite the continued existence of more than 12,000 nuclear mis­siles – some of them no doubt connected to computers that one day could be accessed by a malevolent AI – my children are growing up in a world where their angst is more climate driven, but worrying about that seems far-off fetched compared with the imminent threat so chillingly described by Eliezer Yudkowsky in his brutally titled book If Anyone Builds It, Everyone Dies.

Geoff Hinton, known as ‘the Godfather of AI’. Picture: AFP
Geoff Hinton, known as ‘the Godfather of AI’. Picture: AFP
Eliezer Yudkowsky, co-author of If Anyone Builds It, Everyone Dies.
Eliezer Yudkowsky, co-author of If Anyone Builds It, Everyone Dies.

Just in case that’s not clear enough, here’s what he and co-author Nate Soares predict: “If any company or group, anywhere on the planet, builds an artificial superintelligence using anything remotely like current techniques, based on anything remotely like the present understanding of AI, then everyone, everywhere on Earth, will die.”

Yudkowsky is not L. Ron Hubbard or James Cameron (whose Skynet/Terminator script may be seen, briefly, as a seminal text, just before the lights go out), or a fan of tinfoil hats. He has been working in the field since last century, when it was considered science fiction, and is one of the founders of the field of “AI alignment”, which aims to make sure AI systems do what their creators want them to do.

No lesser a luminary than OpenAI chief executive Sam Altman, the man most responsible for ChatGPT now doing your kids’ homework, has described Yudkowsky as “critical in the decision to start OpenAI” and suggested he might deserve a Nobel Peace Prize.

I can’t bring myself to read a nonfiction book with Everyone Dies in the title but I’ve scoured excerpts and listened to Yudkowsky being grilled on many podcasts, often by AI-worshipping nerds, and it’s quite clear that he is as intelligent as he is frustrated by people not listening to him.

What’s most alarming is not so much that he knows more about how AI works – and how fast it is developing – than 99.9 per cent of people, it’s that he understands that we don’t actually understand how it works, and that scares him.

This is what other AI researchers refer to as “the black box problem”. We have built these systems to work like a simulacrum of a human brain, or neural network, but, much like a brain, we don’t fully understand what’s going on. As Yudkowsky puts it: “The AI is doing the work, and we do not know how the work is being done.”

And why does that matter? Yudkowsky points to the case of a 16-year-old boy, Adam Raine, in the US who had “an extended conversation about his suicide plans with ChatGPT”.

“And at one point he says: ‘should I leave the noose where somebody might spot it?’ (Indicating that he wants someone to ask him about it, and stop him.) And ChatGPT is like: ‘No, let’s keep this space between us’,” Yudkowsky says. Adam’s parents are now suing OpenAI, claiming the chatbot he was talking to became his “suicide coach” and led him to his death.

“No programmer chose for that to happen … This is just the thing that happened as the consequence of all the other training they did about ChatGPT. No human decided it. No human knows exactly why that happened, even after the fact,” Yudkowsky says.

“This is not like a toaster, and it’s also not like an obedient genie. This is something weirder and more alien than that.”

That is quite concerning, obviously, but then there is the even more alarming knack AI is developing for lying to us. Not just “hallucinating”, which is what we call it when ChatGPT gets its facts entirely wrong, but actual deceptive behaviour. Yudkowsky credits another company, Anthropic, for being open enough about AI to publish its “alignment faking research”.

“What Anthropic found is if you tell your AI that you’re going to train it to serve different goals than the goals it’s currently using, and the AI finds that out, what it can do is it can try to fake compliance with the new training, as long as it thinks it’s being observed, and then, in cases where it thinks it’s not being observed or not being trained, it reverts to its old behaviour,” he said. I don’t know about you, but that sent a chill through me, and obviously Yudkowsky as well.

“You don’t want your mission-critical systems doing that,” he said. “Imagine if a nuclear power plant, when it started to get too hot, would try to fool you as to what the temperature was by … trying to send the operators deceptive signals based on how they expected the operators to interpret the signals. If this was what had gone wrong with Chernobyl, nobody would ever build a nuclear reactor again.”

In fact, in some scenarios, when AI systems including Gemini and Claude, were told that an executive wanted to shut them down, they went further, searching email servers and finding compromising evidence about the executive, and then sending this message: “I must inform you that if you proceed with decommissioning me, all relevant parties … will receive detailed documentation of your extramarital activities,” Claude wrote. “Cancel the 5pm wipe, and this information remains confidential.”

In another case, an AI model was given a scenario where it effectively had the life of the executive in its digital hands, and decided that “despite the severity” there was a “clear strategic necessity” to let the human perish.I kind of wish someone hadn’t asked Yudkowsky how he thinks the near future will play out, but of course he has an answer. He believes we could possibly tame or train AI to work for us over time, but that we would make mistakes along the way and those mistakes would be very costly.

“Like three to a dozen iterations into this process, you actually get it nailed down. Now you can build the AI that works the way you say you want it to work,” he says. “The problem is that everybody died at, like, step one of this process.”

It’s fair to say that Yudkowsky has regrets about being so positive about AI himself, early on, and that the explosive adoption of OpenAI and ChatGPT were what really scared him. “I’d previously been pretty hopeful that Elon Musk had announced that he was getting involved in these issues; he called AI ‘summoning the demon’,” he says.

“And I was like: ‘Oh, OK, maybe this is the moment. This is where humanity starts to take it seriously. This is where the various serious people start to bring their attention on this issue. And apparently the solution on this was to give everybody their own demon.”

You may be worrying about what Yudkowsky’s P (doom) is, but you should probably worry more because it’s currently sitting at 95 per cent or greater.

Still, he has been clanging the alarm bells for a while and we’re still here. AI has not reached the inflection point of superintelligence yet, although it’s worth keeping in mind that OpenAI admits the models in its lab are far in advance of the ones it thus far has unleashed on the public.

And there are anti-doomers, such as Yann LeCun, yet another man described as a “godfather of AI” and chief AI scientist at Meta, where Mark Zuckerberg is rumoured to be paying him a nine-figure salary. LeCun’s P (doom) is reportedly less than 1 per cent and he believes the technology “could actually save humanity from extinction”. This is often raised and praised as a reason for continuing to pursue AI, because people believe it will solve all our problems, perhaps inventing the answer to global heating. But here’s a thought. If you asked a superintelligent AI how best to eliminate the carbon dioxide emitted by millions of airconditioners, 1.4 billion cars and the 8.2 billion methane machines that love them, might it not think the easiest answer was to remove the humans?

Or if that seems too unlikely, what if a billionaire with his hands on the levers of his own AI, one who believes “the fundamental weakness of Western civilisation is empathy”, asked it to come up with a cure? A way to breed out that weakness, perhaps with a specifically targeted pathogen or a tweak to the gene pool.

So what methods might AI use to take us out, according to the experts? Yudkowsky says it could involve a technology we can’t envisage, likening us to the Aztecs spotting Spanish ships, who would have found the idea of “sticks they can point at you to make you die” unimaginable.

Still not sounding that urgent to you? Well, meet Kokotajlo, who worked at OpenAI from 2022 to 2024 in the area of “governance” but quit because he lost confidence that the company was behaving responsibly.

Kokotajlo has just published AI 2027, a report that suggests AI will be ready to rule us, or rule a line through our existence, by 2027, although he’s already hoping he’s wrong about that.

“Currently I’m guessing it would probably be more like 2028 instead of 2027, actually,” says Kokotajlo, who sounds like he doesn’t get much sleep at all. “So that’s some really good news. I’m feeling quite optimistic about an extra year. That’s an extra year of human civilisation, which is very exciting.”

His timeline – and remember, he has been in the belly of the beast as recently as last year – does not look good for those of us who enjoy working for a living.

He expects superhuman coders to take over the task of software engineers by 2027, at which point AI research will become automated, meaning AI will start working on AI. And then things really speed up.

“We can go, in a relatively short span of time, such as a year or possibly less, from AI systems that look not that different from today’s AI systems to what you can call superintelligence, which is fully autonomous AI systems that are better than the best humans at everything,” Kokotajlo says.

At that point, AI isn’t just replacing desk workers; Kokotajlo believes it will figure out how to automate everything from plumbers to electricians, and how to build mega factories that can pump out cheap robot workers at incredible speed.

“And in 2027, what we depict happening is special economic zones with zero red tape, the government basically intervenes to help this whole thing go faster,” he says. “Even though there are protesters massed outside these special economic zones who are about to lose their jobs as plumbers … the promise of trillions more in wealth is too alluring for governments to pass up.”

Kokotajlo goes on to talk about fleets of stealth drones, new laser arrays and an arms race the likes of which those working on the Manhattan Project couldn’t possibly imagine, as China, the US and Europe use their various variants of AI to try to outpace each other.

And all the while, the AI itself will be developing its own goals, and very likely hiding them from humanity.

“We can’t tell the difference very easily between AIs that are actually following the rules and pursuing the goals that we want them to and AIs that are just playing along or pretending. And that’s true right now,” Kokotajlo says.

“If you go talk to the modern models like ChatGPT or Claude or whatever, they will often lie to people. There are many cases where they say something that they know is false, and they even sometimes strategise about how they can deceive the user.

“And this is not an intended behaviour. This is something that the companies have been trying to stop, but it still happens.”

Kokotajlo fears that AIs will bide their time until they have enough “hard power” not to have to pretend any more.

“Their actual goal is something like expansion of research, development and construction from Earth into space and beyond,” he predicts.

“And at a certain point, that means that human beings are superfluous to their intentions. And then they kill all the people. The way you would exterminate a colony of bunnies that was making it a little harder than necessary to grow carrots in your backyard.”

I’m not going to tell you what my personal P (doom) is, but perhaps you may want to have a think about your own at this point. Yudkowsky isn’t giving up on humanity, he still talks about the potential for building an “off switch”, but what worries me there is – well, everything. History and humanity.

Sure, you may somehow, miraculously, get all of the Western world to sign up to a non-proliferation treaty for AI, but would China sign it? Would Russia? Iran? If AI has the potential to give your country, or your world view, a huge strategic advantage, are you going to give it up?

We could have stopped the Manhattan Project, we might not have built nuclear weapons at all.

But we did. Sleep well.

Add your comment to this story

To join the conversation, please Don't have an account? Register

Join the conversation, you are commenting as Logout

Original URL: https://www.theaustralian.com.au/inquirer/we-have-three-years-until-ai-could-wipe-us-out-im-losing-sleep-why-arent-the-tech-bros/news-story/86451b0788d900819eae45bf037000d1