Responses API Agent Runtime: Secure Hosted Containers
Responses API Agent Runtime: Secure Hosted Containers
OpenAI
11 mar 2026

¿No sabes por dónde empezar con la IA?Evalúa preparación, riesgos y prioridades en menos de una hora.
¿No sabes por dónde empezar con la IA?Evalúa preparación, riesgos y prioridades en menos de una hora.
➔ Descarga nuestro paquete gratuito de preparación para IA
OpenAI’s Responses API agent runtime is a secure way to run tool-using agents inside a managed computer environment. It combines the Responses API (orchestration), a shell tool (command execution), and hosted containers (isolation and state), so agents can handle files, run code, and scale across multi-step workflows.
Agents only become genuinely useful when they can do things: process files, run scripts, validate outputs, and keep working across multi-step tasks. The catch is obvious — the moment you give an agent execution capability, you introduce risk.
OpenAI’s latest work on the Responses API tackles that trade-off by packaging an agent runtime that’s designed to be both secure and scalable. In practical terms, it means your agent can operate in a controlled “computer environment”, interact with files, run commands via a shell, and maintain state — without you having to stitch together a bespoke execution harness from scratch.
In this article, we’ll break down what’s new, how it works, and what to do (and not do) when you’re putting this into production.
What OpenAI means by an “agent runtime”
A runtime is the environment that sits around the model: it orchestrates tool calls, manages state, and executes actions safely. OpenAI’s approach pairs:
The Responses API, which orchestrates the agent loop and tool calling.
A shell tool, which lets the model propose terminal commands.
Hosted containers, which execute those commands in an isolated environment and hold working files/state.
Put together, this creates a predictable execution surface for long-running work: install dependencies, transform data, generate artefacts, and iterate — all while staying inside a controlled boundary.
What’s new in the upgraded Responses API
1) Hosted shell execution in containers
Instead of treating “running code” as something you must host yourself, the Responses API can now orchestrate hosted container sessions for shell commands. The model proposes commands; the platform executes them; the output is streamed back into the agent loop.
Why it matters:
Isolation: commands run inside a contained environment rather than your production infrastructure.
Repeatability: a standardised environment reduces “works on my machine” issues.
Scale: multiple sessions can run in parallel to speed up multi-step workflows.
2) Better file handling and artefact workflows
Agentic work often is file work: CSVs, PDFs, images, logs, codebases, configuration. With a containerised runtime, the agent can read and write files as part of a workflow, then return the relevant outputs back to your application.
If you’ve ever tried to build an agent that edits a repo, generates a report, or refactors code across multiple files, you’ll recognise why this is a step change.
3) State for long-running tasks (and compaction to keep them moving)
Long runs create a new bottleneck: context windows fill up. OpenAI’s guidance here is to combine a computer environment (where files and intermediate artefacts can live) with compaction patterns that keep the important context while shedding noise.
Net result: agents can tackle more realistic “knowledge work” tasks without collapsing under their own history.
How it works (the loop, in plain English)
At the core is a simple loop:
The Responses API assembles context: your prompt, prior state, and tool instructions.
The model decides what to do next.
If it chooses the shell tool, it proposes one or more shell commands.
The platform runs those commands inside the hosted container.
Output streams back to the model, which decides whether to run more commands or produce a final response.
This repeats until the model returns an answer without further tool calls.
A practical example: turning a messy dataset into a clean report
Here’s the kind of workflow this runtime makes straightforward:
Upload a CSV export from a BI tool.
Use shell to inspect and validate the schema.
Run a script (Python, Node, Go — whatever your workflow needs).
Generate a PDF summary report.
Return the report as an artefact, plus a short executive summary.
What changes with the new runtime is not that agents can do this — it’s that you can do it in a more production-friendly way, with isolation and a predictable execution surface.
Security and governance: what “secure” should mean in practice
A hosted container is a strong foundation, but it doesn’t replace good governance. If you’re deploying agent execution in an organisation, treat security as a design constraint:
Build explicit guardrails
Allowlist tools and actions: give the agent the minimum tool surface it needs.
Constrain domains and destinations for anything that makes network requests.
Block destructive operations (or require approval) for file deletes, overwrites, or anything irreversible.
Keep a human in the loop for high-risk steps
Purchases, authenticated flows, admin actions, or anything that impacts customers should require explicit user confirmation.
Add observability
For production, you want:
Tool call logs
Container session metadata
Artefact lineage (what was created, from what input)
Clear failure modes and retries
Implementation patterns that scale
Pattern 1: Break work into skills
OpenAI’s “skills” concept is effectively a portable bundle of workflows you can reuse. Treat them as your standard operating procedures for repeatable tasks (report generation, data validation, onboarding checks, etc.).
Pattern 2: Parallelise safely
Where tasks are independent (e.g., validating multiple files, running multiple checks), concurrent container sessions can reduce overall run time. The key is to preserve determinism: keep inputs and outputs well defined.
Pattern 3: Use state for artefacts — not for memory bloat
Let the container hold working files and intermediate outputs. Keep the model context lean by compacting or summarising history, and store durable state in your own systems (databases, object storage) where needed.
Where this is already useful for real teams
Knowledge operations: “turn these 20 documents into a structured brief with citations.”
Engineering enablement: “run tests, generate changelogs, open a PR with a summary.”
RevOps/Marketing ops: “clean CRM exports, dedupe, enrich, and produce a weekly dashboard pack.”
Security/compliance: “scan a config repo for policy violations and produce an audit report.”
Summary
OpenAI’s Responses API agent runtime brings agent execution closer to production reality by combining tool orchestration, a shell interface, and hosted containers. Done well, it enables agents that can handle files, run multi-step workflows, and scale — while keeping execution inside a controlled environment.
Next steps
If you’re exploring secure agent execution, we can help you:
design a governed tool surface and runtime controls
identify high-ROI workflows to turn into repeatable “skills”
deploy agents that integrate with your stack (Asana, Miro, Notion, Glean and more)
Explore related pages:
Asana Integration: /asana/
Miro Solutions: /miro/
Notion Features: /notion/
Glean Insights: /glean/
FAQs
Question: What is the Responses API agent runtime?
Answer: It’s a managed execution setup that pairs the Responses API with tools (like shell) running inside hosted containers, so agents can execute steps safely, handle files, and complete multi-turn workflows.
Question: What’s the difference between the shell tool and a Python-only code interpreter?
Answer: A shell environment can run standard command-line utilities and many languages and build tools, not just Python, making it better for real software and data workflows.
Question: How do hosted containers improve security?
Answer: They isolate execution from your production systems, giving you a contained workspace where you can apply stricter controls, reduce blast radius, and limit what the agent can access.
Question: Can agents keep state across steps?
Answer: Yes. The container environment can preserve files and working context across a run, and longer workflows can use compaction patterns to keep model context manageable.
Question: What controls should I add before deploying this in production?
Answer: Use allowlists for tools and domains, block or approve destructive actions, keep humans in the loop for sensitive flows, and add logging and monitoring around tool calls and artefacts.
OpenAI’s Responses API agent runtime is a secure way to run tool-using agents inside a managed computer environment. It combines the Responses API (orchestration), a shell tool (command execution), and hosted containers (isolation and state), so agents can handle files, run code, and scale across multi-step workflows.
Agents only become genuinely useful when they can do things: process files, run scripts, validate outputs, and keep working across multi-step tasks. The catch is obvious — the moment you give an agent execution capability, you introduce risk.
OpenAI’s latest work on the Responses API tackles that trade-off by packaging an agent runtime that’s designed to be both secure and scalable. In practical terms, it means your agent can operate in a controlled “computer environment”, interact with files, run commands via a shell, and maintain state — without you having to stitch together a bespoke execution harness from scratch.
In this article, we’ll break down what’s new, how it works, and what to do (and not do) when you’re putting this into production.
What OpenAI means by an “agent runtime”
A runtime is the environment that sits around the model: it orchestrates tool calls, manages state, and executes actions safely. OpenAI’s approach pairs:
The Responses API, which orchestrates the agent loop and tool calling.
A shell tool, which lets the model propose terminal commands.
Hosted containers, which execute those commands in an isolated environment and hold working files/state.
Put together, this creates a predictable execution surface for long-running work: install dependencies, transform data, generate artefacts, and iterate — all while staying inside a controlled boundary.
What’s new in the upgraded Responses API
1) Hosted shell execution in containers
Instead of treating “running code” as something you must host yourself, the Responses API can now orchestrate hosted container sessions for shell commands. The model proposes commands; the platform executes them; the output is streamed back into the agent loop.
Why it matters:
Isolation: commands run inside a contained environment rather than your production infrastructure.
Repeatability: a standardised environment reduces “works on my machine” issues.
Scale: multiple sessions can run in parallel to speed up multi-step workflows.
2) Better file handling and artefact workflows
Agentic work often is file work: CSVs, PDFs, images, logs, codebases, configuration. With a containerised runtime, the agent can read and write files as part of a workflow, then return the relevant outputs back to your application.
If you’ve ever tried to build an agent that edits a repo, generates a report, or refactors code across multiple files, you’ll recognise why this is a step change.
3) State for long-running tasks (and compaction to keep them moving)
Long runs create a new bottleneck: context windows fill up. OpenAI’s guidance here is to combine a computer environment (where files and intermediate artefacts can live) with compaction patterns that keep the important context while shedding noise.
Net result: agents can tackle more realistic “knowledge work” tasks without collapsing under their own history.
How it works (the loop, in plain English)
At the core is a simple loop:
The Responses API assembles context: your prompt, prior state, and tool instructions.
The model decides what to do next.
If it chooses the shell tool, it proposes one or more shell commands.
The platform runs those commands inside the hosted container.
Output streams back to the model, which decides whether to run more commands or produce a final response.
This repeats until the model returns an answer without further tool calls.
A practical example: turning a messy dataset into a clean report
Here’s the kind of workflow this runtime makes straightforward:
Upload a CSV export from a BI tool.
Use shell to inspect and validate the schema.
Run a script (Python, Node, Go — whatever your workflow needs).
Generate a PDF summary report.
Return the report as an artefact, plus a short executive summary.
What changes with the new runtime is not that agents can do this — it’s that you can do it in a more production-friendly way, with isolation and a predictable execution surface.
Security and governance: what “secure” should mean in practice
A hosted container is a strong foundation, but it doesn’t replace good governance. If you’re deploying agent execution in an organisation, treat security as a design constraint:
Build explicit guardrails
Allowlist tools and actions: give the agent the minimum tool surface it needs.
Constrain domains and destinations for anything that makes network requests.
Block destructive operations (or require approval) for file deletes, overwrites, or anything irreversible.
Keep a human in the loop for high-risk steps
Purchases, authenticated flows, admin actions, or anything that impacts customers should require explicit user confirmation.
Add observability
For production, you want:
Tool call logs
Container session metadata
Artefact lineage (what was created, from what input)
Clear failure modes and retries
Implementation patterns that scale
Pattern 1: Break work into skills
OpenAI’s “skills” concept is effectively a portable bundle of workflows you can reuse. Treat them as your standard operating procedures for repeatable tasks (report generation, data validation, onboarding checks, etc.).
Pattern 2: Parallelise safely
Where tasks are independent (e.g., validating multiple files, running multiple checks), concurrent container sessions can reduce overall run time. The key is to preserve determinism: keep inputs and outputs well defined.
Pattern 3: Use state for artefacts — not for memory bloat
Let the container hold working files and intermediate outputs. Keep the model context lean by compacting or summarising history, and store durable state in your own systems (databases, object storage) where needed.
Where this is already useful for real teams
Knowledge operations: “turn these 20 documents into a structured brief with citations.”
Engineering enablement: “run tests, generate changelogs, open a PR with a summary.”
RevOps/Marketing ops: “clean CRM exports, dedupe, enrich, and produce a weekly dashboard pack.”
Security/compliance: “scan a config repo for policy violations and produce an audit report.”
Summary
OpenAI’s Responses API agent runtime brings agent execution closer to production reality by combining tool orchestration, a shell interface, and hosted containers. Done well, it enables agents that can handle files, run multi-step workflows, and scale — while keeping execution inside a controlled environment.
Next steps
If you’re exploring secure agent execution, we can help you:
design a governed tool surface and runtime controls
identify high-ROI workflows to turn into repeatable “skills”
deploy agents that integrate with your stack (Asana, Miro, Notion, Glean and more)
Explore related pages:
Asana Integration: /asana/
Miro Solutions: /miro/
Notion Features: /notion/
Glean Insights: /glean/
FAQs
Question: What is the Responses API agent runtime?
Answer: It’s a managed execution setup that pairs the Responses API with tools (like shell) running inside hosted containers, so agents can execute steps safely, handle files, and complete multi-turn workflows.
Question: What’s the difference between the shell tool and a Python-only code interpreter?
Answer: A shell environment can run standard command-line utilities and many languages and build tools, not just Python, making it better for real software and data workflows.
Question: How do hosted containers improve security?
Answer: They isolate execution from your production systems, giving you a contained workspace where you can apply stricter controls, reduce blast radius, and limit what the agent can access.
Question: Can agents keep state across steps?
Answer: Yes. The container environment can preserve files and working context across a run, and longer workflows can use compaction patterns to keep model context manageable.
Question: What controls should I add before deploying this in production?
Answer: Use allowlists for tools and domains, block or approve destructive actions, keep humans in the loop for sensitive flows, and add logging and monitoring around tool calls and artefacts.
Recibe noticias y consejos sobre IA cada semana en tu bandeja de entrada
Al suscribirte, das tu consentimiento para que Generation Digital almacene y procese tus datos de acuerdo con nuestra política de privacidad. Puedes leer la política completa en gend.co/privacy.
Generación
Digital

Oficina en Reino Unido
Generation Digital Ltd
33 Queen St,
Londres
EC4R 1AP
Reino Unido
Oficina en Canadá
Generation Digital Americas Inc
181 Bay St., Suite 1800
Toronto, ON, M5J 2T9
Canadá
Oficina en EE. UU.
Generation Digital Américas Inc
77 Sands St,
Brooklyn, NY 11201,
Estados Unidos
Oficina de la UE
Software Generación Digital
Edificio Elgee
Dundalk
A91 X2R3
Irlanda
Oficina en Medio Oriente
6994 Alsharq 3890,
An Narjis,
Riad 13343,
Arabia Saudita
Número de la empresa: 256 9431 77 | Derechos de autor 2026 | Términos y Condiciones | Política de Privacidad
Generación
Digital

Oficina en Reino Unido
Generation Digital Ltd
33 Queen St,
Londres
EC4R 1AP
Reino Unido
Oficina en Canadá
Generation Digital Americas Inc
181 Bay St., Suite 1800
Toronto, ON, M5J 2T9
Canadá
Oficina en EE. UU.
Generation Digital Américas Inc
77 Sands St,
Brooklyn, NY 11201,
Estados Unidos
Oficina de la UE
Software Generación Digital
Edificio Elgee
Dundalk
A91 X2R3
Irlanda
Oficina en Medio Oriente
6994 Alsharq 3890,
An Narjis,
Riad 13343,
Arabia Saudita









