GPT-5.1-Codex-Max: Long-Horizon Coding with Compaction
ChatGPT
OpenAI
Dec 3, 2025
The problem your team keeps hitting
For too long, AI has been excellent at short, single-step tasks, but struggles with multi-day software engineering projects that require deep context and sustained focus. Are you constantly restarting your coding agent because it lost the thread in a complex refactor?
Meet the model built for long-horizon coding
OpenAI’s new GPT-5.1-Codex-Max is specifically designed to overcome this limitation. This is a specialised agentic coding model built for long-running, project-scale work. Its foundational innovation is compaction.
How compaction sustains context
Compaction is a native training process that allows the model to prune its history while coherently preserving the most critical context across multiple context windows, effectively enabling it to work over millions of tokens. This allows the model to:
Sustain complex, iterative workflows like multi-file refactors and prolonged debugging.
Work autonomously for periods exceeding a day.
What this unlocks for engineering productivity
This capability transforms your engineering productivity by removing friction and maximising efficiency. Instead of manual context management or constantly fixing missteps, you get reliable, high-quality implementations with significant performance gains:
Faster, cheaper reasoning: Codex-Max uses approximately 30% fewer “thinking tokens” for similar reasoning effort compared to its predecessor, leading to cost and speed improvements.
Project-level coherence: It maintains a project-level perspective, eliminating the need to manually supply context across iterations.
Proven productivity uplift: Organisations adopting Codex have seen their engineers ship roughly 70% more pull requests.
This model helps your team achieve clarity from chaos by confidently delegating complex, long-horizon coding tasks.
How to put it into practice
To ensure your development programme benefits from this advanced agent:
Use stepwise instructions: Break down large coding jobs into a clear sequence of subtasks (e.g., “1) run tests 2) fix top 3 failing tests 3) summarise changes”).
Choose the right tool: Use Codex-Max for multi-file refactors and complex agentic workflows, reserving standard models for quick edits.
Secure implementation: Remember that enabling network access introduces prompt injection risks. Ensure agents run in secure, sandboxed environments by default.
Contact us to discuss integrating long-horizon agents into your software development life cycle.
FAQ
1) What exactly is “compaction”—is it just summarisation?
No. It’s a native training/process for working across multiple context windows, pruning and preserving critical state so a session can span millions of tokens coherently—beyond a single window’s size. OpenAI+1
2) How long can Codex-Max run by itself?
OpenAI reports internal runs exceeding 24 hours on long-horizon tasks. You should still gate merges with human review. OpenAI
3) Is it cheaper or faster than older Codex models?
At the same “reasoning effort”, Codex-Max uses ~30% fewer thinking tokens—often improving speed/cost for comparable outcomes. OpenAI
4) What are the security defaults?
Codex runs in a sandbox, with network access disabled unless you turn it on. Enabling web/search increases prompt-injection risk; use allow-lists and scanning. OpenAI
5) Where can I use it?
In Codex via ChatGPT plans (Plus/Pro/Business/Edu/Enterprise) across CLI, IDE extension, cloud, and code review; API availability is planned. OpenAI


















