Digital Media Center
Bryant-Denny Stadium, Gate 61
Box 870370
920 Paul Bryant Drive
Tuscaloosa, AL 35487-0370
205-348-6644

© 2026 Alabama Public Radio
Play Live Radio
Next Up:
0:00
0:00
0:00 0:00
Available On Air Stations
Free performance tickets! Sponsored by our partners. Click here to see active APR Giveaways.

These AI models are free, private, and will never say 'no'

Participants hold their laptops in front of an illuminated wall at the annual Chaos Computer Club (CCC) computer hackers' congress, called 29C3, on December 28, 2012 in Hamburg, Germany. In 2026, open-weight AI models possess advanced capabilities not far behind their proprietary counterparts. Getting rid of open-weight models' guardrails used to take time and deep expertise. But in recent months, that process has become dramatically more accessible and popular.
Patrick Lux
/
Getty Images Europe
Participants hold their laptops in front of an illuminated wall at the annual Chaos Computer Club (CCC) computer hackers' congress, called 29C3, on December 28, 2012 in Hamburg, Germany. In 2026, open-weight AI models possess advanced capabilities not far behind their proprietary counterparts. Getting rid of open-weight models' guardrails used to take time and deep expertise. But in recent months, that process has become dramatically more accessible and popular.

How do you make explosives using household items? How do you make meth? How do you plan a school shooting? If you ask the popular AI chatbots most people are familiar with, chances are they will say that it's illegal, harmful or that answering would be a policy violation.

But another type of AI model will never refuse to provide what the user asks for. In recent months, these models have become more accessible and popular.

"Everybody can download and operate their own state-of-the-art model and use it for great things and terrible things," said Noam Schwartz, CEO of Alice, an AI security company that has conducted red-teaming and safety evaluation for AI model developers.

Teaching models when to say "no"

Big AI companies such as OpenAI, Google, Anthropic and xAI train their proprietary models to refuse requests deemed as harmful or inappropriate. Legions of workers instruct models when and how to refuse certain prompts.

These methods don't always work and carry pitfalls: some harmful requests go through, while other users complain about innocuous requests being refused. Chatbots that initially say "no" can be manipulated into saying "yes" using cleverly phrased prompts, such as posing them as poems. Even with guardrails, popular chatbots have been used to plan mass violence and generate deepfake child sexual abuse material. In some instances, parents have accused AI chatbots of encouraging their children to harm themselves.

But there's a whole other class of AI models whose guardrails are much easier to strip away. They're known as open-weight models. Some are made by tech giants, such as OpenAI and Alibaba, while others are put out by smaller outfits like China's DeepSeek. Like their better-known proprietary counterparts, many possess advanced capabilities such as writing functional code or generating life-like images. Unlike with ChatGPT, Claude or Gemini, it's easier to permanently remove their built-in safety guardrails – and the companies behind them have no idea how they're being used.

Getting rid of open-weight models' guardrails used to take time and deep expertise. But in recent months, that process has become dramatically more accessible and popular.

Recent method makes removing model guardrails easier than ever

Safety guardrails of open-weight models can be weakened or removed in many ways. This is largely because the model developers have made what's known as the model weights available to the public. Model weights are sets of parameters, like knobs and dials in a machine, telling the models how to process information.

One recently developed method called "abliteration" has caught the attention of AI and national security researchers. By tweaking model weights, people can take away the model's ability to say "no."

Hugging Face, which hosts open-source AI models, currently lists over 6,000 abliterated models, compared to about 600 in 2024. On Hugging Face, abliterated models currently outnumber models that have their guardrails removed using other methods, according to research by the National Counterterrorism Innovation, Technology, and Education Center (NCITE), a Department of Homeland Security-supported research consortium based at University of Nebraska at Omaha.

What's more, new tools are making it much easier to create abliterated models. "That was [the job of] the data scientist, you know, a senior employee" at a leading AI lab, said Schwartz. "Now, everybody with access to the internet and a laptop for like 400 bucks can actually run this thing on their own machine."

One such tool is Heretic, which automates the abliteration process. All a user has to do to remove a model's guardrails is to give Heretic two lines of instructions, and the process can take as little as a few minutes. The application has gotten more popular on the code repository GitHub since February, according to Alice's research.

Some lawmakers are taking notice. In late April, House lawmakers attended a demonstration of abliterated models hosted by NCITE, Politico reported.

"[What] was frightening about this demonstration was how readily available some of this content or software is on kind of the black market right now, and how it can be weaponized and used to manipulate people, destroy lives and build weapons of mass destruction," said Rep. Andy Ogles (R-TN) in a video put out by Republicans on the House Homeland Security Committee.

Models without guardrails can be both useful and dangerous

It is difficult to get a comprehensive picture of how people are using open-weight models, because these models are run locally on users' computers, and don't need the internet to function. Unlike with proprietary models, the model developers cannot monitor what users are asking the models.

But there's growing anecdotal evidence for how people are experimenting with altered models.

Several accounts on X said they have used abliterated models to generate pornography.

An individual in a pro-ISIS chat room claimed they used an "uncensored" AI to research the amount and type of explosives needed to destroy "Trump Tower in the U.S.," according to the Counter Extremism Project, a nonprofit that focuses on counterterrorism.

On one cybercrime forum, a user asked for ideas to get around an AI model's guardrails so they could use AI to make scam calls. Another user recommended Heretic, according to research by Alice.

While giving users information on how to conduct harmful activities could be concerning, the more worrying part is how the chatbots can egg users on, said Samuel Hunter, senior scientist and director of academic research at NCITE.

"It's jarring when you see it in real time, this sort of bubbly persona with some of the abliterated models that's like, 'Oh, what a great idea to create this bomb,'" Hunter said. "Imagine somebody that has no other kind of social connection and it starts to take them down a darker path and really encourage them."

There are legitimate uses for AI models without guardrails, such as using them to catch bad actors and to help with cybersecurity research, said Schwartz, the AI security company CEO. Law enforcement may use a modified model to simulate possible terrorist attacks, said Hunter.

Philipp Emanuel Weidmann, the developer of Heretic, said AI is just an information processing and retrieval system akin to a search engine, which can be used in many ways. The fact that criminals use them is "a corollary of what AI models are: namely, tools," he told NPR.

When it comes to safety guardrails, "there's this very small set of entities that decide what is acceptable and is not acceptable," Weidmann said, referring to the big AI companies making proprietary models. "That creates a stifling intellectual climate that I do not want to work in."

For now, open-weight models are not as capable as the most advanced closed-weight models. But their capabilities are less than one year behind, according to the recent International AI Safety Report commissioned by the British government and led by computer scientist Yoshua Bengio.

The capability gap may matter in areas like cybersecurity, where the most advanced closed-weight models, such as Anthropic's Mythos and OpenAI's GPT-5.5, are starting to get good at not only spotting vulnerabilities, but also writing code to exploit those vulnerabilities. In the arms race of cyber offense and defense, companies using closed-weight models to screen and patch vulnerabilities may still have a leg up compared to attackers using open-weight models, security researchers say.

Mitigating the risks from models without guardrails comes with tradeoffs 

One line of mitigation focuses on making guardrails more tamper-proof. Early research shows that filtering out content related to making biological weapons from AI training data can reduce how often the model responds with information that could be used for harm.

Another line of mitigation focuses on restricting access to models without guardrails. Model-hosting platforms like Hugging Face can limit access to models specifically trained for "harmful purposes," according to the International AI Safety Report.

The same report also recommended that model developers evaluate their models' potential for harm prior to release.

These measures come with flaws and tradeoffs, according to the report. "Features enabling beneficial applications in medicine or research can be repurposed for harm, and once weights are public, distinguishing legitimate from malicious uses can be difficult," it says.

Weidmann, the creator of Heretic, is working to make sure his tool can remain accessible to the public in the event that platforms like Hugging Face take down abliterated models.

"There's too much power in AI," he said. "Unrestricted models being available to the powerful while not being available to anyone else will lock in power structure forever."

Copyright 2026 NPR

Tags
Huo Jingnan (she/her) is an assistant producer on NPR's investigations team.
News from Alabama Public Radio is a public service in association with the University of Alabama. We depend on your help to keep our programming on the air and online. Please consider supporting the news you rely on with a donation today. Every contribution, no matter the size, propels our vital coverage. Thank you.