Secure AI Agents: OpenAI Defences Against Prompt Injection

OpenAI

In a modern office setting, three professionals engage in a focused discussion around a conference table, surrounded by laptops displaying code, emphasizing teamwork in developing secure AI agents as a defense against prompt injection.

Pas sûr de quoi faire ensuite avec l'IA?Évaluez la préparation, les risques et les priorités en moins d'une heure.

➔ Téléchargez notre kit de préparation à l'IA gratuit

Prompt injection is one of the biggest security risks in AI agents

As AI systems move beyond chat and start browsing the web, calling tools and taking actions, prompt injection becomes a much more serious problem. A malicious instruction hidden in a webpage, document or external tool can try to override system behaviour, expose sensitive information or trigger actions the model should never take.

That is why prompt injection is now a core security issue for AI agents, not just a niche concern for developers building experiments.

This guide explains how OpenAI is strengthening agent security, what prompt injection actually looks like in practice, and what teams should do to reduce risk.

What is prompt injection?

Prompt injection happens when untrusted text tries to manipulate an AI system’s instructions or behaviour.

In simple terms, the model is given one set of trusted directions by its developer, then encounters another set of hostile instructions hidden in user input, webpages, files or tool responses. If the system is not designed carefully, the malicious instruction can interfere with the intended task.

In agentic systems, that can mean:

  • exposing internal data

  • following unsafe instructions

  • using tools in the wrong way

  • taking actions without proper approval

This is why prompt injection matters far more for AI agents than for basic chatbots.

How OpenAI is defending against prompt injection

OpenAI’s security approach is based on layered safeguards rather than any single fix.

1. Automated red teaming

OpenAI uses reinforcement-learning-powered red teaming to test agentic systems against prompt injection and related attacks at scale. This helps surface vulnerabilities earlier and strengthens products before real attackers can exploit them.

2. Layered mitigations in agent surfaces

For browsing and agent experiences, OpenAI highlights protections designed to reduce the chance that hostile web content or other external text can steer model behaviour. These mitigations matter because agents increasingly work across untrusted environments.

3. Developer guidance for secure design

OpenAI also publishes practical guidance for building prompt-injection-resistant systems. That includes narrowing accepted inputs, limiting output behaviour, constraining tool use and isolating untrusted content instead of blending it into high-trust instructions.

Why prompt injection is especially dangerous in AI agents

The risk goes up sharply once an AI system can do more than answer questions.

A model that can browse, open files, query systems or trigger external actions has a much larger attack surface. If it reads hostile content and has broad permissions, the consequences can be far more serious than a bad answer in chat.

That is why secure agent design depends on more than model quality. It depends on how permissions, tools, approvals and external content are handled around the model.

Practical steps developers can take today

Platform safeguards matter, but they are not enough on their own. If you are building with AI agents, defence in depth is essential.

Treat all external content as untrusted

Anything pulled from the web, uploaded by users or returned by tools should be treated as untrusted text. Do not allow it to flow directly into trusted instructions or privileged tool decisions.

Scope tools and permissions tightly

Use least privilege wherever possible. An agent should only have access to the minimum tools, data and actions required for the task. Avoid broad permissions that increase the blast radius of a successful injection attempt.

Require approval for sensitive actions

High-impact actions should not happen silently. Add approval gates for tasks such as sending data, triggering transactions, changing records or visiting unfamiliar destinations.

Constrain inputs and outputs

Limit free-text inputs where possible. Validate structured fields, use allow-lists where appropriate and cap output behaviour to reduce the opportunity for injection chains to expand.

Log tool calls and monitor anomalies

Capture prompts, outputs and tool activity so you can investigate suspicious patterns. Monitoring matters because prompt injection often shows up through unusual sequences of actions rather than one obvious event.

Red team your own workflows

Do not rely only on vendor testing. Run adversarial tests against your own prompts, tools and agent workflows, especially where sensitive data or external actions are involved.

What secure AI agent design looks like in practice

A secure AI agent is not one that blindly follows every instruction it reads. It is one that:

  • separates trusted instructions from untrusted content

  • limits what tools can do

  • requires human review at critical points

  • logs actions for auditability

  • assumes external text may be hostile

That is the mindset teams need if they want to move from AI experimentation to safe operational use.

What this means for enterprise teams

If your organisation is exploring AI agents, prompt injection should be part of your governance conversation from the start.

The key questions are:

  • What can the agent access?

  • What actions can it take?

  • What content does it read from outside trusted systems?

  • Where are human approvals required?

  • How are logs and audits handled?

These are not edge-case questions. They sit at the centre of safe agent deployment.

Bottom line

Prompt injection is one of the defining security problems of modern AI agents. OpenAI is addressing it through automated red teaming, layered product mitigations and guidance for safer agent design. But secure deployment still depends on how teams design permissions, isolate untrusted content and control tool use in practice.

If you are building or deploying AI agents, prompt injection is not something to patch later. It needs to be built into your architecture from day one.

FAQ

What is prompt injection in AI?

Prompt injection is an attack where untrusted text tries to override an AI system’s instructions, expose data or trigger unintended actions.

Why is prompt injection more serious for AI agents?

Because agents can browse, use tools and take actions. That gives malicious instructions more opportunities to cause harm.

How does OpenAI reduce prompt injection risk?

OpenAI uses automated red teaming, layered safeguards in agent experiences and developer guidance for secure design.

What should developers do to protect AI agents?

Treat external content as untrusted, limit permissions, require approval for sensitive actions, constrain inputs and outputs, and monitor tool behaviour.

Can prompt injection be fully eliminated?

No. The practical goal is to reduce risk through layered safeguards and tighter system design, not to assume the problem disappears completely.

Recevez chaque semaine des nouvelles et des conseils sur l'IA directement dans votre boîte de réception

En vous abonnant, vous consentez à ce que Génération Numérique stocke et traite vos informations conformément à notre politique de confidentialité. Vous pouvez lire la politique complète sur gend.co/privacy.

Génération
Numérique

Bureau du Royaume-Uni

Génération Numérique Ltée
33 rue Queen,
Londres
EC4R 1AP
Royaume-Uni

Bureau au Canada

Génération Numérique Amériques Inc
181 rue Bay, Suite 1800
Toronto, ON, M5J 2T9
Canada

Bureau aux États-Unis

Generation Digital Americas Inc
77 Sands St,
Brooklyn, NY 11201,
États-Unis

Bureau de l'UE

Génération de logiciels numériques
Bâtiment Elgee
Dundalk
A91 X2R3
Irlande

Bureau du Moyen-Orient

6994 Alsharq 3890,
An Narjis,
Riyad 13343,
Arabie Saoudite

UK Fast Growth Index UBS Logo
Financial Times FT 1000 Logo
Febe Growth 100 Logo (Background Removed)

Numéro d'entreprise : 256 9431 77 | Droits d'auteur 2026 | Conditions générales | Politique de confidentialité