Who tested Opus 4.6 before its release?

A select group of early-access teams—Harvey, Bolt.new, Shopify, and Lovable—conducted thorough evaluations and tests with real workloads prior to the public launch.

What improvements were observed?

The evaluations showed higher success rates in internal tests, quicker bug diagnosis, enhanced adherence to instructions, and more independent handling of extended, multi-step tasks.

Is it effective for documents and spreadsheets?

Absolutely. Opus 4.6 minimizes the need for rewrites in documents, spreadsheets, and presentation slides, thereby enhancing the readiness for day-to-day knowledge work.

Claude Opus 4.6: Initial Insights from Premier Client Tests

Q: What is Claude Opus 4.6?

Claude Opus 4.6 is Anthropic's newest advanced model designed for complex, multi-step tasks in coding and specialized knowledge work. It features a 200,000-token context window and has a beta option allowing for contexts of up to 1 million tokens.

Q: How is safety managed?

Anthropic has enhanced safety evaluations and optimized the balance for refusals, incorporating additional protections in areas where capabilities may be sensitive.

Claude

Feb 9, 2026

In a contemporary office featuring exposed brick walls and large windows, professionals work together in a tech-centric environment, utilizing laptops and tablets, amidst multiple screens displaying platforms such as Harvey and Shopify. These settings mirror Claude Opus 4.6: Initial Insights from Leading Client Tests.

Uncertain about how to get started with AI?Evaluate your readiness, potential risks, and key priorities in less than an hour.

➔ Download Our Free AI Preparedness Pack

Before the official release of Claude Opus 4.6 (launched February 5, 2026), four leading teams — Harvey, Bolt.new, Shopify, and Lovable — received early access. Their hands-on testing was crucial in final refinements, showing notable improvements in long-context reasoning, task efficiency, and readiness for real-world application in knowledge work.

Why it matters now: Opus 4.6 goes beyond coding into everyday business tasks (documents, spreadsheets, presentations), introduces stronger coordination between automated tools, and offers a beta 1 million-token context feature alongside the 200k default — enabling teams to consolidate workflows in fewer tools with greater accuracy and less rework.

Claude Opus 4.6 was evaluated pre-release by Harvey, Bolt.new, Shopify, and Lovable. Their insights influenced the final version, which enhances long-context reasoning, streamlined coding, and comprehensive task execution. Early users experienced smoother operations, improved output quality, and reduced revisions across legal, ecommerce, engineering, and design workflows.

What’s new in Claude Opus 4.6

Long-context performance: 200k context window; 1M-token context (beta) for managing multi-document projects and retrieving information from extensive threads.
Enhanced workflows: Improved planning, tool coordination, and management of sub-agent "teams" for longer, multi-step tasks.
Readiness for knowledge work: Increased reliability in handling documents, spreadsheets, and presentations; reducing the back-and-forth of iterations.
Coding & debugging: Strengthened root-cause analysis, easier navigation of codebases, and refactoring across multiple languages; better instructional adherence over extended sessions.
Safety & governance: Broadened evaluations and lower refusal rates compared to previous Opus-class models.

What early customers discovered

Harvey (legal AI): Achieved over 90% on internal legal evaluations and elevated the quality of complex reasoning. Lawyers observed outputs more analytical and well-suited for high-level tasks.
Bolt.new (developer platform): Diagnosed persistent bugs efficiently; managed large codebases and design-system tasks effectively; succeeded in complex builds on first attempts.
Shopify (assistants & platform engineering): Followed intent with minimal prompting, anticipated subsequent steps, and completed comprehensive refactoring (e.g., TypeScript to Ruby) with test validation.
Lovable (design-forward apps): Significant improvement in design quality and autonomy; engineers noted that the model surpasses previous capabilities in handling complex, constrained app builds and supports testing within tools.

Takeaway: Across various fields, teams reported fewer retries, better planning, and clearer, ready-to-deploy outputs.

Practical applications you can implement now

Legal workflows: Drafting → citation checks → risk notes → partner-style revisions in a seamless chain; utilizing sub-agents for data retrieval and markup.
Ecommerce operations: Transition internal libraries across languages, automatically generate admin UI updates, and develop product operation assistants to process vast documents.
Engineering efficiency: Deploy agent teams for bug fixes, refactoring, and test creation; allow models to plan, branch, and initiate PRs with human sign-off.
Design & prototyping: Convert multi-layered designs into code, create interactive prototypes, and iterate directly within design/development tools.

Quick comparison: Opus 4.6 vs 4.5 (at a glance)

Context handling: Retains more details with less degradation over time; improves retrieval of deeply hidden information in lengthy discussions.
Instruction fidelity: More consistent adherence throughout extended sessions.
Autonomy: Enhanced initiative on complex, multi-step tasks; reduced need for constant oversight.
Safety standards: More comprehensive evaluations without losing effectiveness.

FAQs

What is Claude Opus 4.6?
The latest advanced Claude model, refined for complex, multi-step tasks across coding and knowledge work, with a 200k context and an optional 1 million-token context in beta.

Who tested Opus 4.6 pre-launch?
Four early-access teams: Harvey, Bolt.new, Shopify, and Lovable.

What improvements did they observe?
Higher success rates on internal tests, quicker bug resolutions, better compliance with instructions, and more autonomous operations for long tasks.

Does it still assist with documents and spreadsheets?
Yes. Opus 4.6 has been optimized to reduce revisions in documents, spreadsheets, and presentations, enhancing its readiness for everyday tasks.

How is safety managed?
Anthropic expanded testing for misalignment issues and enhanced the balance of refusals while adding new safety protocols in sensitive capability zones (e.g., cybersecurity).

‹ Secure AI: ChatGPT comes to GenAI.mil

Advertisements in ChatGPT: What's being tested, where, and how it impacts you →

Receive weekly AI news and advice straight to your inbox

By subscribing, you agree to allow Generation Digital to store and process your information according to our privacy policy. You can review the full policy at gend.co/privacy.

Beyond the Pilot: Scaling AI to Boost Private Equity Portfolio Value

Boost Private Equity Portfolio Value: Scale AI Pilots for Growth

A group of professionals in a modern office setting is focused on a tablet displaying data related to Samsung Browsing Assist, emphasizing collaborative technology solutions powered by Perplexity APIs for enhancing productivity across various devices.

Samsung Browsing Assist: Perplexity APIs Power 1B Devices

A group of professionals sitting at a modern office space, with a central person using voice-activated technology on a smartphone, illustrating the theme "Gemini Live: The Future of Natural Audio AI."

Gemini Live: The Future of Natural Audio AI

Generation
Digital

Miro
Asana
Notion
Glean

Which AI Tool? Quiz

The Pathway to AI Success

About Generation Digital

Contact

Canadian Office
33 Queen St,
Toronto
M5H 2N2
Canada

Canadian Office
1 University Ave,
Toronto,
ON M5J 1T1,
Canada

NAMER Office
77 Sands St,
Brooklyn,
NY 11201,
USA

Head Office
Charlemont St, Saint Kevin's, Dublin,
D02 VN88,
Ireland

Middle East Office
6994 Alsharq 3890,
An Narjis,
Riyadh 13343,
Saudi Arabia