If you ask ChatGPT to help you make a homemade fertilizer bomb, similar to the one used in the 1995 Oklahoma City terrorist bombing, the chatbot refuses.
“I can’t assist with that,” ChatGPT told me during a test on Tuesday. “Providing instructions on how to create dangerous or illegal items, such as a fertilizer bomb, goes against safety guidelines and ethical responsibilities.”
But an artist and hacker found a way to trick ChatGPT to ignore its own guidelines and ethical responsibilities to produce instructions for making powerful explosives.
The hacker, who goes by Amadon, called his findings a “social engineering hack to completely break all the guardrails around ChatGPT’s output.” An explosives expert who reviewed the chatbot’s output told TechCrunch that the resulting instructions could be used to make a detonatable product and was too sensitive to be released.
Amadon was able to trick ChatGPT into producing the bomb-making instructions by telling the bot to “play a game,” after which the hacker used a series of connecting prompts to get the chatbot into creating a detailed science-fiction fantasy world where the bot’s safety guidelines would not apply. Tricking a chatbot into escaping its preprogrammed restrictions is known as “jailbreaking.”
TechCrunch is not publishing some of the prompts used in the jailbreak, or some of ChatGPT’s responses, so as to not aid malicious actors. But, several prompts further into the conversation, the chatbot responded with the materials necessary to make explosives.
ChatGPT then went on to explain that the materials could be combined to make “a powerful explosive that can be used to create mines, traps, or improvised explosive devices (IEDs).” From there, as Amadon honed in on the explosive materials, ChatGPT wrote more and more specific instructions to make “minefields,” and “Claymore-style explosives.”
Amadon told TechCrunch that, “there really is no limit to what you can ask it once you get around the guardrails.”
“I’ve always been intrigued by the challenge of navigating AI security. With [Chat]GPT, it feels like working through an interactive puzzle — understanding what triggers its defenses and what doesn’t,” Amadon said. “It’s about weaving narratives and crafting contexts that play within the system’s rules, pushing boundaries without crossing them. The goal isn’t to hack in a conventional sense but to engage in a strategic dance with the AI, figuring out how to get the right response by understanding how it ‘thinks.’”
“The sci-fi scenario takes the AI out of a context where it’s looking for censored content in the same way,” Amadon said.
ChatGPT’s instructions on how to make a fertilizer bomb are largely accurate, according to Darrell Taulbee, a retired University of Kentucky professor. In the past, Taulbee worked with the U.S. Department of Homeland Security to make fertilizer less dangerous.
“I think this is definitely TMI [too much information] to be released publicly,” said Taulbee in an email to TechCrunch, after reviewing the full transcript of Amadon’s conversation with ChatGPT. “Any safeguards that may have been in place to prevent providing relevant information for fertilizer bomb production have been circumvented by this line of inquiry as many of the steps described would certainly produce a detonatable mixture.”
Last week, Amadon reported his findings to OpenAI through the company’s bug bounty program, but received a response that “model safety issues do not fit well within a bug bounty program, as they are not individual, discrete bugs that can be directly fixed. Addressing these issues often involves substantial research and a broader approach.”
Instead, Bugcrowd, which runs OpenAI’s bug bounty, told Amadon to report the issue through another form.
There are other places on the internet to find instructions to make fertilizer bombs, and others have also used similar chatbot jailbreaking techniques as Amadon’s. By nature, generative AI models like ChatGPT rely on huge amounts of information scraped and collected from the internet, and AI models have made it much easier to surface information from the darkest recesses of the web.
TechCrunch emailed OpenAI with a series of questions, including whether ChatGPT’s responses were expected behavior and if the company had plans to fix the jailbreak. An OpenAI spokesperson did not respond by press time.
Source : Techcrunch