LLMs gone bad: The dark side of generative AI

Topics:

Jun. 20, 2025

Artificial intelligence (AI) has arrived. According to a recent Deloitte report, 78% of companies plan to increase their AI spending this year, with 74% saying that generative AI (GenAI) initiatives have met or exceeded expectations.

Accessibility is the cornerstone of AI success. Large or small, digitally native or brick-and-mortar, any business can benefit from intelligent tools. But this accessibility isn't inherently ethical. Malicious actors are experiencing similar success with AI, using large language models (LLMs) to create and power new attack vectors.

Left unchecked, these so-called "dark LLMs" pose a significant risk for organizations. Here's what companies need to know about navigating the new state of AI security and mitigating the risk of dark LLMs.

What is a dark LLM?

Dark LLMs are LLMs with their guardrails removed.

Large language models form the foundation of generative AI tools. They are trained using massive amounts of data. Over time, they can both understand and generate natural language, and they continue to improve this understanding. This makes LLMs ideal for answering questions and carrying out tasks since users can speak to AI interfaces the same way they speak to humans.

LLMs power generative AI tools such as OpenAI's ChatGPT, Google's PaLM models, and IBM's watsonx. There are also a host of open-source LLMs that companies can use to build in-house solutions.

Along with their ability to understand natural languages, LLMs share another common feature: guardrails. These guardrails are what prevent LLMs from doing anything a user asks, such as providing protected information or creating code that would let them hack into a network. It's worth noting that these guardrails aren't perfect — certain prompts can circumvent these guardrails and let users generate malicious content. For example, research found that ChatGPT competitor DeepSeek failed to stop a single one of 50 malicious "jailbreak" prompts.

Dark LLMs remove guardrails altogether. Typically built on open-source platforms, these large language models are designed with malicious intent. Often hosted on the dark web as free or for-pay services, dark LLMs can help attackers identify security weaknesses, create code to attack systems, or design more effective versions of phishing or social engineering attacks.

Which dark LLMs are the most popular?

Using freely available tools coupled with moderate technology expertise, attackers can create their own LLM. These models aren't all created equal, however — just like their legitimate counterparts, the amount and quality of data used for training significantly impact the accuracy and effectiveness of their outputs.

Popular dark LLMs include:

WormGPT – WormGPT is an open-source LLM with six billion parameters. It lives behind a dark web paywall and allows users to jailbreak ChatGPT. This dark LLM can be used to craft and launch business email compromise (BEC) attacks.
FraudGPT – FraudGPT can write code, create fake web pages and discover vulnerabilities. It is available both on the dark web and through services like Telegram.
DarkBard – Based on Google's AI chatbot, Bard, this dark LLM offers similar features to FraudGPT.
WolfGPT – A relative newcomer to the dark LLM space, WolfGPT is coded in Python and billed as an alternative to ChatGPT, minus the guardrails.

These four are just a sampling of the dark LLMs available. Typically, malicious users pay to access these tools via the dark web. They're likely used as starting points for network attacks — bad actors may ask these LLMs to discover gaps in cybersecurity or write high-quality phishing emails that are hard for staff to spot.

How can companies mitigate dark LLM risks?

Dark LLMs provide good answers to bad questions, giving attackers a leg up in creating malicious code and finding software vulnerabilities. What's more, almost any LLM can be made "dark" using the right jailbreak prompt.

All in all, it sounds pretty bleak, right? Not quite.

This is because LLMs excel at improving code and suggesting new avenues for attack, but they don't do so well in the real world when left to their own devices. For example, the Chicago Sun-Times recently published a list of must-read books for the summer. The caveat? AI created the list, and most of the books on it aren't real. Fast-food giant McDonald's, meanwhile, let AI loose on drive-thru orders, which struggled to get the solution to understand what people were saying or add the right items to their order. In one case, the interface added 260 (unwanted) chicken nuggets. The same constraints apply to dark LLMs. While they can help build better tools, these tools are most effective in the hands of humans.

This is good news for businesses. While the threat of dark LLMs remains worrisome, the same practices that keep data safe now will help defend assets from LLM-driven attacks. Best practices include:

1. If you see something, say something

Humans remain a key component of effective defense. Consider phishing emails. No matter how well-crafted, they require human interaction to succeed. By training staff to recognize the hallmarks of phishing efforts — and more importantly, say something when they see something amiss — businesses can significantly reduce their risk.

2) Get back to basics

When in doubt, get back to the basics. Fundamental security practices such as strong encryption, robust authentication, and zero trust are just as effective against AI-driven attacks as they are against more common threat vectors.

3) Stay ahead of the game

AI tools help cybercriminals build better code and create more convincing fakes. But this doesn't make them invisible. Using advanced threat detection and response tools, businesses are better equipped to see threats coming and stop them. Companies can also harness the power of AI-enabled security to outsmart malicious intelligence.

Bottom line? AI is both boon and bane for businesses. For every ethical use, there's a malicious counterpart, and dark LLMs are simply the latest iteration. While they're worrisome, they're not unstoppable. By combining human oversight with solid security hygiene and advanced detection tools, companies can shine a light on attacker efforts and keep the darkness at bay.

Securing tomorrow: A guide to the role of AI in cybersecurity

Doug Bonderud

Doug Bonderud is an award-winning writer with a talent for bridging the gap between complex and conversational across technology, innovation, and the human condition.

Search the blog

The Ransomware Insights Report 2025

Key findings about the experience and impact of ransomware on organizations worldwide

Get the report

Managed Vulnerability Security: Faster remediation, fewer risks, easier compliance

See how easy it can be to find the vulnerabilities cybercriminals want to exploit

WATCH THE WEBINAR