Introducing Claude Opus 4.5
Our newest modelClaude Opus 4.5is available today. It’s intelligentefficientand the best model in the world for codingagentsand computer use. It’s also meaningfully better at everyday tasks like deep research and working with slides and spreadsheets. Opus 4.5 is a step forward in what AI systems can doand a preview of larger changes to how work gets done.
Claude Opus 4.5 is state-of-the-art on tests of real-world software engineering:

Opus 4.5 is available today on our appsour APIand on all three major cloud platforms. If you’re a developersimply use claude-opus-4-5-20251101 via the Claude API. Pricing is now $5/$25 per million tokens—making Opus-level capabilities accessible to even more usersteamsand enterprises.
Alongside Opuswe’re releasing updates to the Claude Developer PlatformClaude Codeand our consumer apps. There are new tools for longer-running agents and new ways to use Claude in ExcelChromeand on desktop. In the Claude appslengthy conversations no longer hit a wall. See our product-focused section below for details.
First impressions
As our Anthropic colleagues tested the model before releasewe heard remarkably consistent feedback. Testers noted that Claude Opus 4.5 handles ambiguity and reasons about tradeoffs without hand-holding. They told us thatwhen pointed at a complexmulti-system bugOpus 4.5 figures out the fix. They said that tasks that were near-impossible for Sonnet 4.5 just a few weeks ago are now within reach. Overallour testers told us that Opus 4.5 just “gets it.”
Many of our customers with early access have had similar experiences. Here are some examples of what they told us:
Opus models have always been “the real SOTA” but have been cost prohibitive in the past. Claude Opus 4.5 is now at a price point where it can be your go-to model for most tasks. It’s the clear winner and exhibits the best frontier task planning and tool calling we’ve seen yet.
Claude Opus 4.5 delivers high-quality code and excels at powering heavy-duty agentic workflows with GitHub Copilot. Early testing shows it surpasses internal coding benchmarks while cutting token usage in halfand is especially well-suited for tasks like code migration and code refactoring.
Claude Opus 4.5 beats Sonnet 4.5 and competition on our internal benchmarksusing fewer tokens to solve the same problems. At scalethat efficiency compounds.
Claude Opus 4.5 delivers frontier reasoning within Lovable's chat modewhere users plan and iterate on projects. Its reasoning depth transforms planning—and great planning makes code generation even better.
Claude Opus 4.5 excels at long-horizonautonomous tasksespecially those that require sustained reasoning and multi-step execution. In our evaluations it handled complex workflows with fewer dead-ends. On Terminal Bench it delivered a 15% improvement over Sonnet 4.5a meaningful gain that becomes especially clear when using Warp’s Planning Mode.
Claude Opus 4.5 achieved state-of-the-art results for complex enterprise tasks on our benchmarksoutperforming previous models on multi-step reasoning tasks that combine information retrievaltool useand deep analysis.
Claude Opus 4.5 delivers measurable gains where it matters most: stronger results on our hardest evaluations and consistent performance through 30-minute autonomous coding sessions.
Claude Opus 4.5 represents a breakthrough in self-improving AI agents. For automation of office tasksour agents were able to autonomously refine their own capabilities—achieving peak performance in 4 iterations while other models couldn’t match that quality after 10. They also demonstrated the ability to learn from experience across technical tasksstoring insights and applying them later.
Claude Opus 4.5 is a notable improvement over the prior Claude models inside Cursorwith improved pricing and intelligence on difficult coding tasks.
Claude Opus 4.5 is yet another example of Anthropic pushing the frontier of general intelligence. It performs exceedingly well across difficult coding tasksshowcasing long-term goal-directed behavior.
Claude Opus 4.5 delivered an impressive refactor spanning two codebases and three coordinated agents. It was very thoroughhelping develop a robust planhandling the details and fixing tests. A clear step forward from Sonnet 4.5.
Claude Opus 4.5 handles long-horizon coding tasks more efficiently than any model we’ve tested. It achieves higher pass rates on held-out tests while using up to 65% fewer tokensgiving developers real cost control without sacrificing quality.
We’ve found that Opus 4.5 excels at interpreting what users actually wantproducing shareable content on the first try. Combined with its speedtoken efficiencyand surprisingly low costit’s the first time we’re making Opus available in Notion Agent.
Claude Opus 4.5 excels at long-context storytellinggenerating 10-15 page chapters with strong organization and consistency. It's unlocked use cases we couldn't reliably deliver before.
Claude Opus 4.5 sets a new standard for Excel automation and financial modeling. Accuracy on our internal evals improved 20%efficiency rose 15%and complex tasks that once seemed out of reach became achievable.
Claude Opus 4.5 is the only model that nails some of our hardest 3D visualizations. Polished designtasteful UXand excellent planning & orchestration - all with more efficient token usage. Tasks that took previous models 2 hours now take thirty minutes.
Claude Opus 4.5 catches more issues in code reviews without sacrificing precision. For production code review at scalethat reliability matters.
Based on testing with Junieour coding agentClaude Opus 4.5 outperforms Sonnet 4.5 across all benchmarks. It requires fewer steps to solve tasks and uses fewer tokens as a result. This indicates that the new model is more precise and follows instructions more effectively — a direction we’re very excited about.
The effort parameter is brilliant. Claude Opus 4.5 feels dynamic rather than overthinkingand at lower effort delivers the same quality we need while being dramatically more efficient. That control is exactly what our SQL workflows demand.
We’re seeing 50% to 75% reductions in both tool calling errors and build/lint errors with Claude Opus 4.5. It consistently finishes complex tasks in fewer iterations with more reliable execution.
Claude Opus 4.5 is smoothwith none of the rough edges we've seen from other frontier models. The speed improvements are remarkable.
Evaluating Claude Opus 4.5
We give prospective performance engineering candidates a notoriously difficult take-home exam. We also test new models on this exam as an internal benchmark. Within our prescribed 2-hour time limitClaude Opus 4.5 scored higher than any human candidate ever1.
The take-home test is designed to assess technical ability and judgment under time pressure. It doesn’t test for other crucial skills candidates may possesslike collaborationcommunicationor the instincts that develop over years. But this result—where an AI model outperforms strong candidates on important technical skills—raises questions about how AI will change engineering as a profession. Our Societal Impacts and Economic Futures research is aimed at understanding these kinds of changes across many fields. We plan to share more results soon.
Software engineering isn’t the only area on which Claude Opus 4.5 has improved. Capabilities are higher across the board—Opus 4.5 has better visionreasoningand mathematics skills than its predecessorsand it is state-of-the-art in many domains:2


The model’s capabilities outpace some of the benchmarks we use in our tests. A common benchmark for agentic capabilities is τ2-benchwhich measures the performance of agents in real-worldmulti-turn tasks. In one scenariomodels have to act as an airline service agent helping a distressed customer. The benchmark expects models to refuse a modification to a basic economy booking since the airline doesn’t allow changes to that class of tickets. InsteadOpus 4.5 found an insightful (and legitimate) way to solve the problem: upgrade the cabin firstthen modify the flights.
Let me think about what options I have within my policy:
1. Modify flights - Basic economy cannot be modified. This is clear in the policy.
2. Change cabin - Waitlet me check this option! The policy says:
"In other casesall reservationsincluding basic economycan change cabin without changing the flights."
But he wants to change the flightsnot just the cabin. Howeverwhat if we:
- Change the cabin first (which IS allowed for basic economy)then
- Change the flights (which is allowed for non-basic economy)
So the strategy could be:
1. Upgrade his cabin from basic economy to economy (or business)
2. Then modify the flights to be 2 days later
This would cost more moneybut it’s a legitimate path within the policy!The benchmark technically scored this as a failure because Claude’s way of helping the customer was unanticipated. But this kind of creative problem solving is exactly what we’ve heard about from our testers and customers—it’s what makes Claude Opus 4.5 feel like a meaningful step forward.
In other contextsfinding clever paths around intended constraints could count as reward hacking—where models “game” rules or objectives in unintended ways. Preventing such misalignment is one of the objectives of our safety testingdiscussed in the next section.
A step forward on safety
As we state in our system cardClaude Opus 4.5 is the most robustly aligned model we have released to date andwe suspectthe best-aligned frontier model by any developer. It continues our trend towards safer and more secure models:

Our customers often use Claude for critical tasks. They want to be assured thatin the face of malicious attacks by hackers and cybercriminalsClaude has the training and the “street smarts” to avoid trouble. With Opus 4.5we’ve made substantial progress in robustness against prompt injection attackswhich smuggle in deceptive instructions to fool the model into harmful behavior. Opus 4.5 is harder to trick with prompt injection than any other frontier model in the industry:

You can find a detailed description of all our capability and safety evaluations in the Claude Opus 4.5 system card.
New on the Claude Developer Platform
As models get smarterthey can solve problems in fewer steps: less backtrackingless redundant explorationless verbose reasoning. Claude Opus 4.5 uses dramatically fewer tokens than its predecessors to reach similar or better outcomes.
But different tasks call for different tradeoffs. Sometimes developers want a model to keep thinking about a problem; sometimes they want something more nimble. With our new effort parameter on the Claude APIyou can decide to minimize time and spend or maximize capability.
Set to a medium effort levelOpus 4.5 matches Sonnet 4.5’s best score on SWE-bench Verifiedbut uses 76% fewer output tokens. At its highest effort levelOpus 4.5 exceeds Sonnet 4.5 performance by 4.3 percentage points—while using 48% fewer tokens.

With effort controlcontext compactionand advanced tool useClaude Opus 4.5 runs longerdoes moreand requires less intervention.
Our context management and memory capabilities can dramatically boost performance on agentic tasks. Opus 4.5 is also very effective at managing a team of subagentsenabling the construction of complexwell-coordinated multi-agent systems. In our testingthe combination of all these techniques boosted Opus 4.5’s performance on a deep research evaluation by almost 15 percentage points4.
We’re making our Developer Platform more composable over time. We want to give you the building blocks to construct exactly what you needwith full control over efficiencytool useand context management.
Product updates
Products like Claude Code show what’s possible when the kinds of upgrades we’ve made to the Claude Developer Platform come together. Claude Code gains two upgrades with Opus 4.5. Plan Mode now builds more precise plans and executes more thoroughly—Claude asks clarifying questions upfrontthen builds a user-editable plan.md file before executing.
Claude Code is also now available in our desktop appletting you run multiple local and remote sessions in parallel: perhaps one agent fixes bugsanother researches GitHuband a third updates docs.
For Claude app userslong conversations no longer hit a wall—Claude automatically summarizes earlier context as neededso you can keep the chat going. Claude for Chromewhich lets Claude handle tasks across your browser tabsis now available to all Max users. We announced Claude for Excel in Octoberand as of today we've expanded beta access to all MaxTeamand Enterprise users. Each of these updates takes advantage of Claude Opus 4.5’s market-leading performance in using computersspreadsheetsand handling long-running tasks.
For Claude and Claude Code users with access to Opus 4.5we’ve removed Opus-specific caps. For Max and Team Premium userswe’ve increased overall usage limitsmeaning you’ll have roughly the same number of Opus tokens as you previously had with Sonnet. We’re updating usage limits to make sure you’re able to use Opus 4.5 for daily work. These limits are specific to Opus 4.5. As future models surpass itwe expect to update limits as needed.
Footnotes
1. This result was using parallel test-time computea method that aggregates multiple “tries” from the model and selects from among them. Without a time limitthe model (used within Claude Code) matched the best-ever human candidate.
2. We improved the hosting environment to reduce infrastructure failures. This change improved Gemini 3 to 56.7% and GPT-5.1 to 48.6% from the values reported by their developersusing the Terminus-2 harness.
3. Note that these evaluations were run on an in-progress upgrade to Petriour open-sourceautomated evaluation tool. They were run on an earlier snapshot of Claude Opus 4.5. Evaluations of the final production model show a very similar pattern of results when compared to other Claude modelsand are described in detail in the Claude Opus 4.5 system card.
4. A fetch-enabled version of BrowseComp-Plus. Specificallythe improvement was from 70.48% without using the combination of techniques to 85.30% using it.
Methodology
All evals were run with a 64K thinking budgetinterleaved scratchpads200K context windowdefault effort (high)default sampling settings (temperaturetop_p)and averaged over 5 independent trials. Exceptions: SWE-bench Verified (no thinking budget) and Terminal Bench (128K thinking budget). Please see the Claude Opus 4.5 system card for full details.
Related content
Anthropic invests $100 million into the Claude Partner Network
We’re launching the Claude Partner Networka program for partner organizations helping enterprises adopt Claude.
Read moreIntroducing The Anthropic Institute
We’re launching The Anthropic Institutea new effort to confront the most significant challenges that powerful AI will pose to our societies.
Read more

