OpenAI Responses API: Secure Agents with Hosted Containers

OpenAI Responses API: Secure Agents with Hosted Containers

OpenAI

Feb 13, 2026

In a modern office setting, two focused individuals collaborate at a computer station where code and data are displayed on a monitor, surrounded by office essentials and coffee cups, highlighting a dynamic work environment.

Uncertain about how to get started with AI?Evaluate your readiness, potential risks, and key priorities in less than an hour.

Uncertain about how to get started with AI?Evaluate your readiness, potential risks, and key priorities in less than an hour.

➔ Download Our Free AI Preparedness Pack

OpenAI’s Responses API can run secure, scalable agents by pairing tool orchestration with a hosted computer environment. With the shell tool and OpenAI-hosted containers, agents can execute commands, manage files, and maintain state across multi-step workflows. This improves reliability for long-running tasks while reducing security risk through controlled execution.

Agentic workflows become far more useful when they can do things: run scripts, transform files, install dependencies, validate outputs, and generate artefacts. But as soon as an agent can execute code, organisations face a familiar question: how do we get the benefit without creating a security problem?

OpenAI’s latest Responses API enhancements introduce a practical answer: a hosted “computer environment” that combines tool orchestration, a shell tool, and OpenAI-hosted containers so agents can execute tasks with stronger isolation, better state handling, and improved file management.

What’s new in the Responses API runtime

OpenAI positions this as an agent runtime built from several building blocks:

  • Responses API orchestration: the service manages the agent loop, tool calls, and multi-turn continuation.

  • Shell tool: the agent can propose shell commands to run.

  • Hosted containers: the shell runs inside an OpenAI-hosted container environment, creating isolation and a predictable execution surface.

  • Streaming output: command output is streamed back so the agent can react in near real time.

  • Long-running support: server-side “compaction” keeps agent runs going without context exploding.

Together, these features enable agents that can work for longer, handle files more reliably, and execute deterministic steps instead of guessing.

How it works (conceptually)

The flow is straightforward:

  1. You send a request to the Responses API.

  2. The model decides whether to answer directly or use tools.

  3. If it chooses the shell tool, it returns one or more commands.

  4. The Responses API runs those commands inside the hosted container and streams the output.

  5. The model sees the output and either:

    • runs follow-up commands,

    • calls other tools,

    • or produces the final response.

This continues until the model stops requesting tools.

Why hosted containers matter for security

Hosted containers create a separation between agent execution and your core systems. That’s valuable because:

  • the runtime can be constrained (resources, execution context)

  • tool access can be controlled and audited

  • you can keep high-impact actions behind explicit, gated tools

This doesn’t eliminate risk, but it reduces blast radius and makes agent execution more governable.

File handling and state: what “stateful agents” really means

In practice, “state” shows up in two places:

  • Conversation state: the Responses API can carry structured state across turns.

  • Execution state: the hosted container can persist files and runtime context during the agent run, so the agent can create outputs (reports, transformed datasets, logs) and continue work without rebuilding everything each step.

For longer workflows, compaction helps reduce token growth while maintaining the essentials.

Practical workflows you can build

The new runtime is most valuable for tasks that benefit from deterministic execution.

1) Data transformation and validation

  • parse and clean CSV exports

  • validate ranges and completeness

  • generate summary tables and charts

2) Report and artefact generation

  • run scripts that produce outputs

  • generate markdown reports

  • package files for downstream workflows

3) Debugging and incident support

  • reproduce issues in a controlled environment

  • analyse logs and produce summaries

  • draft runbooks or post-incident write-ups

4) CI-style checks on agent outputs

  • verify calculations

  • check generated artefacts

  • run lightweight tests before humans approve actions

A sensible enterprise rollout pattern

If you want to deploy this safely, treat it like any other production capability.

  1. Start read-only
    Use the runtime for analysis and artefact generation first.

  2. Define tool boundaries
    Separate low-risk tools from high-impact tools. Put approvals on writes.

  3. Instrument everything
    Log tool calls, command outputs, and exceptions.

  4. Adopt approvals for risky actions
    “Compute then propose” is the safe default.

  5. Build an evaluation harness
    Test prompt injection scenarios and tool misuse attempts on your real workflows.

Where Generation Digital can help

Generation Digital helps teams move from “agent demos” to governed systems.

We can support:

  • selecting high-value workflows that benefit from hosted execution

  • designing safe tool patterns (allow-lists, approvals, identity boundaries)

  • evaluation and monitoring so agents scale responsibly

  • integrating outputs into your workflow stack

Related links

  • Explore Asana integration: /asana/

  • Discover Miro’s capabilities: /miro/

  • Learn about Notion features: /notion/

  • Glean insights: /glean/

Summary

OpenAI’s Responses API now supports a more complete agent runtime: hosted containers plus a shell tool for controlled execution, improved state and file handling, and mechanisms to keep long-running workflows stable. Used well, it enables agents that are more reliable and easier to govern.

Next steps: If you’re planning a secure agent pilot or want to scale one, speak with Generation Digital: https://www.gend.co/contact

FAQs

Q1: What is the primary benefit of the new Responses API?
It enables a stronger agent runtime: tool orchestration plus hosted execution so agents can run commands, handle files, and continue multi-step workflows more reliably.

Q2: How does the API ensure security?
By running shell execution inside hosted containers and encouraging controlled tool access, you can isolate execution and gate high-impact actions behind approvals and policy.

Q3: Can the Responses API handle large-scale operations?
Yes. The architecture supports parallel execution across container sessions and long-running workflows via state management and compaction.

Q4: Do I have to use hosted execution?
No. OpenAI supports both hosted shell containers and a local shell runtime you execute yourself, depending on how much control you need.

Q5: What’s the safest way to start?
Start with deterministic tasks (data transforms, validation, reporting) and keep anything that changes production systems behind explicit tools with human approval.

OpenAI’s Responses API can run secure, scalable agents by pairing tool orchestration with a hosted computer environment. With the shell tool and OpenAI-hosted containers, agents can execute commands, manage files, and maintain state across multi-step workflows. This improves reliability for long-running tasks while reducing security risk through controlled execution.

Agentic workflows become far more useful when they can do things: run scripts, transform files, install dependencies, validate outputs, and generate artefacts. But as soon as an agent can execute code, organisations face a familiar question: how do we get the benefit without creating a security problem?

OpenAI’s latest Responses API enhancements introduce a practical answer: a hosted “computer environment” that combines tool orchestration, a shell tool, and OpenAI-hosted containers so agents can execute tasks with stronger isolation, better state handling, and improved file management.

What’s new in the Responses API runtime

OpenAI positions this as an agent runtime built from several building blocks:

  • Responses API orchestration: the service manages the agent loop, tool calls, and multi-turn continuation.

  • Shell tool: the agent can propose shell commands to run.

  • Hosted containers: the shell runs inside an OpenAI-hosted container environment, creating isolation and a predictable execution surface.

  • Streaming output: command output is streamed back so the agent can react in near real time.

  • Long-running support: server-side “compaction” keeps agent runs going without context exploding.

Together, these features enable agents that can work for longer, handle files more reliably, and execute deterministic steps instead of guessing.

How it works (conceptually)

The flow is straightforward:

  1. You send a request to the Responses API.

  2. The model decides whether to answer directly or use tools.

  3. If it chooses the shell tool, it returns one or more commands.

  4. The Responses API runs those commands inside the hosted container and streams the output.

  5. The model sees the output and either:

    • runs follow-up commands,

    • calls other tools,

    • or produces the final response.

This continues until the model stops requesting tools.

Why hosted containers matter for security

Hosted containers create a separation between agent execution and your core systems. That’s valuable because:

  • the runtime can be constrained (resources, execution context)

  • tool access can be controlled and audited

  • you can keep high-impact actions behind explicit, gated tools

This doesn’t eliminate risk, but it reduces blast radius and makes agent execution more governable.

File handling and state: what “stateful agents” really means

In practice, “state” shows up in two places:

  • Conversation state: the Responses API can carry structured state across turns.

  • Execution state: the hosted container can persist files and runtime context during the agent run, so the agent can create outputs (reports, transformed datasets, logs) and continue work without rebuilding everything each step.

For longer workflows, compaction helps reduce token growth while maintaining the essentials.

Practical workflows you can build

The new runtime is most valuable for tasks that benefit from deterministic execution.

1) Data transformation and validation

  • parse and clean CSV exports

  • validate ranges and completeness

  • generate summary tables and charts

2) Report and artefact generation

  • run scripts that produce outputs

  • generate markdown reports

  • package files for downstream workflows

3) Debugging and incident support

  • reproduce issues in a controlled environment

  • analyse logs and produce summaries

  • draft runbooks or post-incident write-ups

4) CI-style checks on agent outputs

  • verify calculations

  • check generated artefacts

  • run lightweight tests before humans approve actions

A sensible enterprise rollout pattern

If you want to deploy this safely, treat it like any other production capability.

  1. Start read-only
    Use the runtime for analysis and artefact generation first.

  2. Define tool boundaries
    Separate low-risk tools from high-impact tools. Put approvals on writes.

  3. Instrument everything
    Log tool calls, command outputs, and exceptions.

  4. Adopt approvals for risky actions
    “Compute then propose” is the safe default.

  5. Build an evaluation harness
    Test prompt injection scenarios and tool misuse attempts on your real workflows.

Where Generation Digital can help

Generation Digital helps teams move from “agent demos” to governed systems.

We can support:

  • selecting high-value workflows that benefit from hosted execution

  • designing safe tool patterns (allow-lists, approvals, identity boundaries)

  • evaluation and monitoring so agents scale responsibly

  • integrating outputs into your workflow stack

Related links

  • Explore Asana integration: /asana/

  • Discover Miro’s capabilities: /miro/

  • Learn about Notion features: /notion/

  • Glean insights: /glean/

Summary

OpenAI’s Responses API now supports a more complete agent runtime: hosted containers plus a shell tool for controlled execution, improved state and file handling, and mechanisms to keep long-running workflows stable. Used well, it enables agents that are more reliable and easier to govern.

Next steps: If you’re planning a secure agent pilot or want to scale one, speak with Generation Digital: https://www.gend.co/contact

FAQs

Q1: What is the primary benefit of the new Responses API?
It enables a stronger agent runtime: tool orchestration plus hosted execution so agents can run commands, handle files, and continue multi-step workflows more reliably.

Q2: How does the API ensure security?
By running shell execution inside hosted containers and encouraging controlled tool access, you can isolate execution and gate high-impact actions behind approvals and policy.

Q3: Can the Responses API handle large-scale operations?
Yes. The architecture supports parallel execution across container sessions and long-running workflows via state management and compaction.

Q4: Do I have to use hosted execution?
No. OpenAI supports both hosted shell containers and a local shell runtime you execute yourself, depending on how much control you need.

Q5: What’s the safest way to start?
Start with deterministic tasks (data transforms, validation, reporting) and keep anything that changes production systems behind explicit tools with human approval.

Receive weekly AI news and advice straight to your inbox

By subscribing, you agree to allow Generation Digital to store and process your information according to our privacy policy. You can review the full policy at gend.co/privacy.

Generation
Digital

Canadian Office
33 Queen St,
Toronto
M5H 2N2
Canada

Canadian Office
1 University Ave,
Toronto,
ON M5J 1T1,
Canada

NAMER Office
77 Sands St,
Brooklyn,
NY 11201,
USA

Head Office
Charlemont St, Saint Kevin's, Dublin,
D02 VN88,
Ireland

Middle East Office
6994 Alsharq 3890,
An Narjis,
Riyadh 13343,
Saudi Arabia

UK Fast Growth Index UBS Logo
Financial Times FT 1000 Logo
Febe Growth 100 Logo (Background Removed)

Business Number: 256 9431 77 | Copyright 2026 | Terms and Conditions | Privacy Policy

Generation
Digital

Canadian Office
33 Queen St,
Toronto
M5H 2N2
Canada

Canadian Office
1 University Ave,
Toronto,
ON M5J 1T1,
Canada

NAMER Office
77 Sands St,
Brooklyn,
NY 11201,
USA

Head Office
Charlemont St, Saint Kevin's, Dublin,
D02 VN88,
Ireland

Middle East Office
6994 Alsharq 3890,
An Narjis,
Riyadh 13343,
Saudi Arabia

UK Fast Growth Index UBS Logo
Financial Times FT 1000 Logo
Febe Growth 100 Logo (Background Removed)


Business No: 256 9431 77
Terms and Conditions
Privacy Policy
© 2026