Does it help with documents and spreadsheets?

Yes. Opus 4.6 reduces rewrites in docs, sheets, and slides, improving production readiness for day-to-day knowledge work.

Claude Opus 4.6: Early Insights from Top Client Tests

Q: What is Claude Opus 4.6?

Claude Opus 4.6 is Anthropic's latest frontier model for complex, multi-step tasks across coding and knowledge work, with a 200k context window and a 1M-token context option in beta.

Q: Who tested Opus 4.6 pre-launch?

Four early-access teams—Harvey, Bolt.new, Shopify, and Lovable—ran intensive evaluations and real workloads before public release.

Q: What improvements did they see?

Higher pass-rates on internal evals, faster bug diagnosis, better instruction following, and more autonomous execution across long, multi-step tasks.

Q: How is safety handled?

Anthropic expanded safety evaluations and tuned refusal balance, with additional guardrails in sensitive capability areas.

Claude

Feb 9, 2026

In a modern office with exposed brick walls and large windows, professionals collaborate in a tech-focused workspace, using laptops and digital tablets, surrounded by multiple digital displays showcasing platforms like Harvey and Shopify, reflective of Claude Opus 4.6: Early Insights from Top Client Tests.

Free AI at Work Playbook for managers using ChatGPT, Claude and Gemini.

➔ Download the Playbook

Before the official release of Claude Opus 4.6 (launched 5 February 2026), four leading teams — Harvey, Bolt.new, Shopify, and Lovable — received early access. Their hands‑on testing informed final tuning, with measurable gains in long‑context reasoning, agentic workflows, and production‑readiness for real knowledge‑work tasks.

Why it matters now: Opus 4.6 pushes beyond coding into everyday business tasks (docs, sheets, slides), introduces stronger agent orchestration, and adds a beta 1M‑token context option alongside a 200k default — helping teams consolidate workflows in fewer tools with higher accuracy and fewer retries.

Claude Opus 4.6 was tested pre‑launch by Harvey, Bolt.new, Shopify, and Lovable. Their feedback shaped the final release, which improves long‑context reasoning, agentic coding, and end‑to‑end task execution. Early users reported smoother operations, higher output quality, and fewer revisions across legal, ecommerce, engineering, and design workflows.

What’s new in Claude Opus 4.6

Long‑context performance: 200k context window; 1M‑token context (beta) for multi‑document work and retrieval across lengthy threads.
Agentic workflows: Improved planning, tool‑calling and sub‑agent “team” orchestration for longer, multi‑step tasks.
Knowledge‑work readiness: Better reliability in documents, spreadsheets, and presentations; fewer back‑and‑forth iterations.
Coding & debugging: Stronger root‑cause analysis, codebase navigation, and multi‑language refactors; better adherence to instructions over long sessions.
Safety & governance: Expanded evaluations and lower over‑refusal rates versus prior Opus‑class models.

What early customers discovered

Harvey (legal AI): Broke 90% on internal legal‑work evals and raised the quality bar on complex reasoning. Lawyers noted more analytical, “thinking” outputs suitable for BigLaw‑grade tasks.
Bolt.new (developer platform): Diagnosed stubborn bugs on first pass; handled large codebases and design‑system tasks; one‑shot complex builds that previously needed multiple attempts.
Shopify (assistants & platform engineering): Followed intent with minimal prompting, anticipated next steps, and completed large refactors (e.g., TypeScript → Ruby) while validating against tests.
Lovable (design‑forward apps): Marked uplift in design quality and autonomy; engineers reported the model “goes further” on difficult, multi‑constraint app builds and supports in‑tool testing.

Takeaway: Across different domains, teams reported fewer retries, better planning, and cleaner, production‑ready outputs.

Practical applications you can ship now

Legal workflows: Draft → cite‑check → risk notes → partner‑style revisions in one chain; use sub‑agents for retrieval and redlining.
Ecommerce ops: Migrate internal libraries between languages, auto‑generate admin UI changes, and build product‑ops assistants that reason over large docs.
Engineering velocity: Spin up agent teams for bug triage, refactors, and test generation; let models plan, branch, and open PRs with human sign‑off.
Design & prototyping: Translate multi‑layered designs to code, generate interactive prototypes, and iterate directly in your design/dev tools.

Quick comparison: Opus 4.6 vs 4.5 (at a glance)

Context handling: Holds more details with less “context rot”; better retrieval of buried information in long threads.
Instruction fidelity: More consistent adherence over long‑running sessions.
Autonomy: Improved initiative on multi‑step tasks; less micromanagement required.
Safety posture: Wider, deeper evaluations without sacrificing capability.

FAQs

What is Claude Opus 4.6?
The latest Claude frontier model, tuned for complex, multi‑step tasks across coding and knowledge work, with 200k context and a 1M‑token context option in beta.

Who tested Opus 4.6 pre‑launch?
Four early‑access teams: Harvey, Bolt.new, Shopify, and Lovable.

What improvements did they see?
Higher pass‑rates on internal evals, faster bug diagnosis, better instruction‑following, and more autonomous execution across long tasks.

Does it still help with documents and spreadsheets?
Yes. Opus 4.6 was tuned to reduce rewrites in docs, sheets, and slides, making it more production‑ready for daily knowledge work.

How is safety handled?
Anthropic expanded testing for misaligned behaviours and improved refusal balance, while adding new guardrails in sensitive capability areas (e.g., cybersecurity).

‹ Secure AI: ChatGPT comes to GenAI.mil

Ads in ChatGPT: What’s being tested, where, and how it affects you ›

Get weekly AI news and advice delivered to your inbox

By subscribing you consent to Generation Digital storing and processing your details in line with our privacy policy. You can read the full policy at gend.co/privacy.

Beyond the Pilot: Scaling AI to Boost Private Equity Portfolio Value

Boost Private Equity Portfolio Value: Scale AI Pilots for Growth

A group of professionals in a modern office setting is focused on a tablet displaying data related to Samsung Browsing Assist, emphasizing collaborative technology solutions powered by Perplexity APIs for enhancing productivity across various devices.

Samsung Browsing Assist: Perplexity APIs Power 1B Devices

A group of professionals sitting at a modern office space, with a central person using voice-activated technology on a smartphone, illustrating the theme "Gemini Live: The Future of Natural Audio AI."

Gemini Live: The Future of Natural Audio AI

Generation
Digital

Miro
Asana
Notion
Glean

Which AI Tool? Quiz

The Pathway to AI Success

About Generation Digital

Contact

UK Office

Generation Digital Ltd
33 Queen St,
London
EC4R 1AP
United Kingdom

Canada Office

Generation Digital Americas Inc
181 Bay St., Suite 1800
Toronto, ON, M5J 2T9
Canada

USA Office

Generation Digital Americas Inc
77 Sands St,
Brooklyn, NY 11201,
United States

EU Office

Generation Digital Software
Elgee Building
Dundalk
A91 X2R3
Ireland

Middle East Office

6994 Alsharq 3890,
An Narjis,
Riyadh 13343,
Saudi Arabia