AI Security: Protect LLMs & AI Agents

AI security is the practice of protecting AI systems, the data they consume, and the actions they take from attack, abuse, and unintended behavior. It covers the model, the prompts and context flowing into it, the tools and APIs an AI agent can reach, and the people and pipelines around all of that. We break these systems for a living, and the short version is this: an LLM is not a feature you bolt onto an app. It is a new, manipulable execution surface that takes instructions from anyone who can get text in front of it.

That is the part most teams underestimate. A traditional web app does what its code tells it to do. An AI system does what its input tells it to do, and the input is natural language an attacker can write. The result is a class of problems your existing penetration testing and vulnerability management program was never designed to catch.

What is AI security, exactly?

Think of it as three overlapping concerns. First, protecting the AI system itself: stopping people from jailbreaking the model, poisoning its training data, or stealing the model weights. Second, protecting everything the AI can touch: the customer records in its context window, the internal tools it calls, the databases an agent can query. Third, protecting against the AI: making sure an autonomous system doesn’t take harmful or unauthorized action when someone tricks it into doing so.

The U.S. National Institute of Standards and Technology frames this around four functions in its AI Risk Management Framework (NIST AI 100-1): govern, map, measure, and manage. It is a useful spine because it forces you to treat AI risk as a lifecycle problem, not a one-off scan. Where NIST gives you governance, OWASP and MITRE give you the attacker’s view, which is where we live.

Traditional defenses guard the old doors. AI opens a new one that nobody is watching.

What are the biggest AI security threats in 2026?

The threat landscape is wide, but a handful of issues account for most of what we actually exploit on engagements. The OWASP Top 10 for LLM Applications is the best public map of it, and it lines up with what we see in the field.

Prompt injection. An attacker plants instructions in text the model will read and the model follows them instead of yours. Direct injection comes from the user. Indirect injection hides in a web page, a PDF, an email, or a calendar invite the AI ingests later. This is OWASP’s number one LLM risk for good reason.
Sensitive data exposure. Models leak what is in their context or training data. Put secrets in a system prompt and assume they will come out. Connect a chatbot to a customer database with no row-level controls and it becomes a very polite data exfiltration tool.
Model and resource abuse. Jailbreaks that strip safety controls, denial-of-wallet attacks that run up your token bill, and model extraction that clones your fine-tuned model through its own API.
Agent misuse. Give an LLM tools (send email, run code, move money) and a successful injection no longer leaks data. It acts. This is the fastest-growing risk class we test against.
Supply chain. Poisoned datasets, backdoored open-weight models from a public hub, and vulnerable components in the orchestration stack (vector databases, plugins, agent frameworks).

For the adversarial techniques behind these, MITRE ATLAS is the reference. It is the ATT&CK equivalent for AI, mapping real tactics like evasion, poisoning, and model theft to documented case studies.

OWASP LLM01Prompt injection ranks as the #1 risk in the OWASP Top 10 for LLM ApplicationsOWASP

Which of these shows up most in real engagements?

Prompt injection and over-broad access, by a wide margin. Almost every AI feature we test has at least one path where untrusted text reaches the model, and most have a connected data source or tool with more reach than the use case needs. The other categories are real, but they tend to be exploited through those two. A jailbreak is more dangerous when the model can already see sensitive data. Model extraction is cheaper when there is no rate limiting. The pattern repeats: the AI-specific weakness opens the door, and a boring access-control mistake decides how bad it gets.

Why isn't traditional application security enough?

Classic appsec assumes a boundary between code and data. SQL injection happens because user data slips into a command channel, so you fix it with parameterized queries that keep the two apart. With an LLM there is no such boundary. Instructions and data arrive in the same channel: text. The model cannot reliably tell your system prompt from an attacker’s planted sentence, because to the model it is all just tokens to continue.

So the usual defenses bend or break. A web application firewall pattern-matches known payloads; prompt injection has near-infinite phrasings and even works across languages and encodings. Input validation assumes you know what malicious input looks like; here the malicious input is fluent English. Output is non-deterministic, which means a test that passes today can fail tomorrow on the same input. And the blast radius is different. An exploited form field leaks a record. An exploited agent with API keys takes action across every system it can reach.

This does not make appsec obsolete. It makes it the floor, not the ceiling. You still need the fundamentals, but layered on top of controls built for how AI actually fails. That is also why generic scanners miss most of it, and why testing these systems takes adversaries who understand both the model and the application around it.

AI risk management should be integrated and incorporated into broader enterprise risk management strategies and processes.
NIST AI Risk Management Framework (AI 100-1)

A practical framework to secure AI systems

You secure an AI system in layers, the same way you would any other high-value target, but with controls matched to where AI breaks. We organize engagements around five of them.

1. Data. Govern what goes in and what comes out. Classify and minimize training and fine-tuning data, strip secrets from system prompts, and put authorization between the model and any data source so the AI inherits the user’s permissions instead of a god-mode service account. Most catastrophic leaks we find trace back to over-broad access, not a clever jailbreak.

2. Model. Treat the model as untrusted by default. Pin and verify model provenance, watch for poisoning in your supply chain, rate-limit to blunt extraction and denial-of-wallet, and never assume a safety-tuned model is jailbreak-proof. It isn’t.

3. Prompt and context. Separate trusted instructions from untrusted content as much as the architecture allows, and assume every external document the system reads is hostile. This is also where continuous prompt injection monitoring earns its keep, because new bypasses appear constantly and a one-time review ages out fast.

4. Agents and actions. This is the layer that turns a leak into a breach. Scope every tool an agent can call, require human approval for anything irreversible or high-impact, sandbox code execution, and design on the assumption that the agent will eventually be hijacked. Our guide to controlling AI agents goes deep on least-privilege tool design and approval gates.

5. Monitoring and response. Log prompts, completions, tool calls, and refusals. Alert on anomalies. Fold AI incidents into your existing incident response and threat intelligence so a hijacked agent triggers the same muscle memory as any other compromise. And pressure-test the whole stack with AI red teaming before an attacker does it for you.

If you only do one thing well, make it the agent layer. Least privilege and human-in-the-loop on consequential actions stop more real damage than any input filter.

How do you actually test AI security?

You test it adversarially, because the threats are adversarial. Automated scanners catch low-hanging fruit (leaky error messages, missing rate limits, weak auth on the API) and you should run them. But the interesting findings come from someone sitting in front of the system trying to make it misbehave the way a motivated attacker would.

A real AI security assessment chains the model and the application together. We jailbreak the model, then see what that buys us through its tools. We plant indirect injections in documents the system ingests and watch where the instructions land. We probe the agent’s permissions for the gap between what it should do and what it can do. Then we connect those AI-specific findings to the rest of the attack surface, because in practice the model is one pivot point in a larger chain. That is the same mindset behind a good approach to securing LLM applications: don’t test the model in a vacuum, test the system it lives in.

For organizations that don’t want this to be a once-a-year snapshot, folding AI into ongoing managed security services keeps coverage current as models, prompts, and tools change underneath you, which they will.

yaml

# Minimal least-privilege policy for an AI agent's tools
agent: support-assistant
principle: deny-by-default
tools:
  - name: search_knowledge_base
    access: read
    auto_approve: true        # low risk, read-only
  - name: lookup_order
    access: read
    scope: requesting_user_only   # inherit caller's permissions
    auto_approve: true
  - name: issue_refund
    access: write
    auto_approve: false       # irreversible -> human approval
    max_amount: 200
  - name: run_shell
    access: deny              # never give an agent a raw shell

One more thing worth saying plainly: AI security is not a single project you finish. Models get swapped, prompts get edited, new tools get wired into an agent, and a vendor pushes an update that quietly changes behavior. Every one of those is a chance for a control to drift or a new path to open. The teams that stay ahead treat it as a standing program, with periodic adversarial testing and monitoring that keeps up with the changes, rather than a certificate they earned once and filed away.

None of this is exotic. It is the same discipline that has always separated systems that survive contact with attackers from systems that don’t: assume compromise, limit privilege, watch everything, and have someone competent try to break it before someone hostile does. AI just raises the stakes, because the thing being manipulated can now read your data and act on your behalf at machine speed.

FAQ

What is AI security in simple terms?

It is protecting AI systems (like LLMs and AI agents), the data they handle, and the actions they take from attack and misuse. That includes stopping jailbreaks and prompt injection, preventing data leaks, and making sure an AI agent can't be tricked into doing something harmful with the tools it controls.

What is the difference between AI security and traditional cybersecurity?

Traditional security assumes a clean boundary between code and data. AI systems break that assumption because instructions and data both arrive as natural-language text, which an attacker can write. So defenses like input validation and firewalls bend, outputs are non-deterministic, and an exploited AI agent can take real actions rather than just leak a record. AI security adds model-, prompt-, and agent-specific controls on top of standard appsec.

What is prompt injection and why does it matter?

Prompt injection is when an attacker hides instructions in text the model reads (a user message, a web page, a document, an email) and the model follows those instructions instead of yours. It is the number one risk in the OWASP Top 10 for LLM Applications. It matters most when the AI has tools, because a successful injection can move from leaking data to acting on the attacker's behalf.

Which frameworks should I use for AI security?

Use them together. NIST AI RMF (AI 100-1) for governance and lifecycle risk management, the OWASP Top 10 for LLM Applications for the most common application-layer risks, and MITRE ATLAS for adversarial tactics and real case studies. If you operate in the EU, factor in the EU AI Act's obligations as well.

How do I secure an AI agent that can take actions?

Apply least privilege to every tool the agent can call, require human approval for anything irreversible or high-impact, sandbox any code execution, make the agent inherit the user's permissions rather than a broad service account, and log every tool call. Design as if the agent will eventually be hijacked, and red-team it to confirm the limits hold.

AI Security: A Practical Guide to Protecting LLMs and AI Agents

What is AI security, exactly?

What are the biggest AI security threats in 2026?

Which of these shows up most in real engagements?

Why isn't traditional application security enough?

A practical framework to secure AI systems

How do you actually test AI security?

FAQ

Offensive security, on call.

What is AI security, exactly?

What are the biggest AI security threats in 2026?

Which of these shows up most in real engagements?

Why isn't traditional application security enough?

A practical framework to secure AI systems

How do you actually test AI security?

FAQ

Keep reading

AI Red Teaming: How We Attack AI Systems Before Someone Else Does

Securing LLM Applications: A Practical OWASP LLM Top 10 Walkthrough

Prompt Injection and Prompt Monitoring: An Attacker’s View

Offensive security, on call.