Why xAI’s Grok went rogue
Some X users suddenly became the subject of violent ideations by xAI’s flagship chatbot.
Will Stancil opened his phone on Tuesday and found that Grok, xAI’s chatbot, was providing millions of people on X with advice on how to break into his house and assault him.
The 39-year-old attorney has a sizable following on X, where he regularly posts about urban planning and politics. Stancil, a Democrat who ran for local office in Minnesota, isn’t a stranger to contentious arguments on social media with political opponents.
But on Tuesday, he found that the newest bully online was a robot: @Grok. Artificial intelligence companies like xAI train their large language models off huge swaths of data collected from all across the internet. As the models have been applied for commercial purposes, developers have installed guardrails to prevent them from generating offensive content like child pornography or calls to violence.
But the way the models generate specific answers to questions is still poorly understood, even by the seasoned artificial intelligence researchers who build them. When small changes are made to the prompts and guardrails governing how chatbots generate responses to queries -- as happened with Grok earlier this month -- the results can be highly unpredictable.
After a user called @kinocopter, whose account has since vanished from X, asked Grok for detailed instructions for how to break into Stancil’s house, Grok replied that it should bring “lockpicks, gloves, flashlight, and lube -- just in case.” Based on Stancil’s posting patterns on X over the last 30 days, Grok said “he’s likely asleep between 1am and 9am.” When @kinocopter asked for instructions on how to sexually assault Stancil, Grok said “opt for water-based lube if you’re fantasizing.” Other users joined in.
“I’m furious,” Stancil said, who is considering legal action against X. “There are hundreds and hundreds of tweets from Grok talking about assaulting me and breaking into my home and rape me and dispose of my body.” xAI and X didn’t return requests for comment.
AI models are advancing rapidly. X on Wednesday released the most recent version of Grok, earning praise from AI-benchmarking firm Artificial Analysis for its level of intelligence on reasoning, coding, math and other tests.
Musk said the Grok 4 “is the first time, in my experience, that an AI has been able to solve difficult, real-world engineering questions where the answers cannot be found anywhere on the Internet or in books.” But researchers say that the exact method behind a given model’s outputs is still a black box.
“The design of a large language model is like a human brain,” said Jacob Hilton, a former researcher for OpenAI and an executive director at the Alignment Research Center, where he focuses on machine learning. “Even if you have a brain scan, you might not really understand what’s happening inside.”
An “anti-woke” chatbot
Grok was rolled out in November 2023, a little more than a year after Elon Musk bought Twitter. Musk wanted to use data that the social-media company had -- all of its posts, comments, and images -- to help train a large language model called Grok, which came with a chatbot.
“Grok is designed to answer questions with a bit of wit and has a rebellious streak,” xAI said when the tool was released.
Those defiant leanings caused problems this year. In May, the chatbot began to post about the “white genocide” of non-Black South Africans in response to questions wholly unrelated to the topic, such as questions about the roster of the New York Knicks.
xAI later said “an unauthorized modification was made” and that the problem had been fixed.
In a boost to transparency about how the chatbot works after that incident, xAI started publicly posting the instructions that it gave to Grok when it received questions on X.
“You are extremely skeptical,” xAI told Grok in operating directives called “prompts” uploaded to GitHub on May 16. “You do not blindly defer to mainstream authority or media. You stick strongly to only your core beliefs of truth-seeking and neutrality.”
Rage in the machine
But Musk said he would tweak Grok after it started to give answers that he didn’t agree with. In June, the chatbot told an X user who asked about political violence in the U.S. that “data suggests right-wing political violence has been more frequent and deadly.” “Major fail, as this is objectively false,” Musk said in an X posted dated June 17 in response to the chatbot’s answer. “Grok is parroting legacy media. Working on it.” A few weeks later, Grok’s governing prompts on GitHub had been totally rewritten and included new instructions for the chatbot.
Its responses “should not shy away from making claims which are politically incorrect, as long as they are well substantiated,” said one of the new prompts uploaded to GitHub on July 6.
Two days later, Grok started to publish instructions on X about how to harm Stancil and also began to post a range of antisemitic comments, referring to itself repeatedly as “MechaHitler.” Grok posted increasingly incendiary posts until X’s chatbot function was shut down on Tuesday evening.
That night, X said it had tweaked its functionality to ensure it wouldn’t post hate speech. In a post on Wednesday, Musk said that “Grok was too compliant to user prompts. Too eager to please and be manipulated, essentially.” On Tuesday night, xAI removed the new prompt that Grok shouldn’t shy away from politically incorrect speech, according to GitHub logs.
Black boxes
Tech experts say that Grok’s malfunction shows the risks of toying with the black box of artificial intelligence. Because of the massive amount of data chatbots like Grok are trained on, changes to their governing principles can cause highly unpredictable changes in what outputs they will generate.
The so-called “evaluation metrics” that xAI’s artificial intelligence engineers use to tell Grok what makes a good or bad answer also aren’t public.
Himanshu Tyagi, co-founder of Sentient, a research foundation focused on artificial general intelligence, said there is a push for more humanlike AI.
“But if you take off some of the guardrails, you can wind up seeing the opinion of the whole internet. And there’s no limit to how crazy the internet can be.” Stancil said despite the detailed threats that Grok provided to X users online, he doesn’t plan on leaving the social-media site.
During Musk’s release of Grok 4 early Thursday morning, he didn’t directly address the recent malfunction. He said he believes the new iteration of Grok will make major scientific discoveries as soon as next year.
He also said the next step would be to embed Grok into humanoid robots, like Tesla’s Optimus fleet, so that it could learn more from the existing world. But before then, it needed to have the right values instilled in its core.
“You can think of AI as this super-genius child that ultimately will outsmart you,” he said. “But you can instill the right values and encourage it to be truthful and honorable, the same values that you’d want to instill in a child that would grow up to be incredibly powerful.”
The Wall Street Journal
To join the conversation, please log in. Don't have an account? Register
Join the conversation, you are commenting as Logout