Claude 4: Opus vs Sonnet, Benchmarks, and Dev Workflow with Claude Code

by Ayush Kumar | May 23, 2025

Ready to build cheaper?

Custom CPU plans from as little as $0.012/hour.

Today, Anthropic unveiled Claude Opus 4 and Claude Sonnet 4, redefining what’s possible in software engineering, coding precision, and tool-based thinking. Claude Opus 4 stands out as the most advanced model for developers, consistently delivering top-tier results on long, uninterrupted workflows. With a commanding 72.5% on SWE-bench and 43.2% on Terminal-bench, it handles hours-long, multi-step challenges with a level of consistency that was previously out of reach. Claude Sonnet 4, meanwhile, offers a well-balanced upgrade from 3.7, achieving a standout 72.7% SWE-bench score and offering sharper reasoning, better code navigation, and more accurate responses across coding scenarios.

These models aren’t just faster—they’re smarter, more focused, and more practical in real-world applications. Developers can now pair these tools with VS Code and JetBrains for seamless background execution, GitHub integrations, and native code suggestions. With parallel tool execution, precise instruction following, and long-term memory through file-based context, Claude 4 models introduce a powerful shift in how people build and reason through technical problems.

Resource

GitHub

Link: https://github.com/anthropics/claude-code

Claude 4 models deliver strong performance across coding, reasoning, multimodal capabilities, and agentic tasks. See appendix for more on methodology.

Benchmark	Claude Opus 4	Claude Sonnet 4	Claude Sonnet 3.7	OpenAI o3	OpenAI GPT-4.1	Gemini 2.5 Pro (Preview 05-06)
Agentic coding (SWE-bench Verified)	72.5% / 79.4%	72.7% / 80.2%	62.3% / 70.3%	69.1%	54.6%	63.2%
Agentic terminal coding (Terminal-bench)	43.2% / 50.0%	35.5% / 41.3%	35.2%	30.2%	30.3%	25.3%
Graduate-level reasoning (GPQA Diamond)	79.6% / 83.3%	75.4% / 83.8%	78.2%	83.3%	66.3%	83.0%
Agentic tool use (TAU-bench Retail)	81.4%	80.5%	81.2%	70.4%	68.0%	—
Agentic tool use (TAU-bench Airline)	59.6%	60.0%	58.4%	52.0%	49.4%	—
Multilingual Q&A (MMLU)	88.8%	86.5%	85.9%	88.8%	83.7%	—
Visual reasoning (MMMU validation)	76.5%	74.4%	75.0%	82.9%	74.8%	79.6%
High school math competition (AIME 2025)	75.5% / 90.0%	70.5% / 85.0%	54.8%	88.9%	—	83.0%

Claude 4 models lead on SWE-bench Verified, a benchmark for performance on real software engineering tasks. See appendix for more on methodology.

Model	Accuracy (Base)	Accuracy (with parallel test-time compute)
Claude Opus 4	72.5%	79.4%
Claude Sonnet 4	72.7%	80.2%
Claude Sonnet 3.7	62.3%	70.3%
OpenAI Codex-1	72.1%	—
OpenAI o3	69.1%	—
OpenAI GPT-4.1	54.6%	—
Gemini 2.5 Pro	63.2%	—

How Claude 4 Sets New Standards in Performance Benchmarks

The performance results shared for Claude Opus 4 and Claude Sonnet 4 reflect a rigorous and transparent evaluation process designed to mirror real-world usage. Both models were tested across a blend of immediate-response tasks and extended thinking challenges involving deeper reasoning over longer contexts—up to 64,000 tokens. For coding-specific benchmarks like SWE-bench Verified and Terminal-bench, the models worked without extended thinking, operating under tightly scoped single-attempt conditions with two core tools: a bash shell and a string-based file editor. Claude 4 models set new highs in these tasks using only 500 problems, while OpenAI’s scores reflect a slightly smaller 477-task subset.

For extended thinking benchmarks—like GPQA Diamond, TAU-bench, MMMLU, and AIME—performance surged when the models were encouraged to reason step-by-step using tool feedback and parallel workflows. Notably, TAU-bench scores were gathered with longer sequences and additional step capacity, allowing the models to better plan, reason, and refine their outputs through iterative completions. For high-compute results, multiple completions were sampled, regression-breaking patches were filtered out, and the most effective responses were selected through internal review—leading to peak scores of 79.4% for Opus 4 and 80.2% for Sonnet 4. These scores don’t just represent raw accuracy—they reflect a shift in how complex software and reasoning tasks are approached at scale.

Claude Opus 4 — Built for Depth, Focus, and Endurance

Claude Opus 4 represents a major leap forward in building digital systems that can handle deep, uninterrupted thinking. Designed for complex, high-stakes work, it excels at tasks that demand multiple steps, structured logic, and long attention spans. Whether it’s a seven-hour engineering workflow, a legal audit across thousands of documents, or building systems that need to remember and evolve over time—Opus 4 stays locked in, delivering results with clarity, structure, and stamina. It’s not just fast; it’s deliberate, organized, and capable of picking up where it left off. With built-in memory capabilities and precision reasoning, Opus 4 unlocks new workflows where sustained effort matters.

Where Claude Opus 4 Shines:

Large-Scale Development Tasks
Refactor complex codebases, migrate architectures, or build out full-stack systems from scratch with reliable flow and structure.
Process Automation for Knowledge Work
Set up digital workflows to handle multi-step processes like legal research, compliance audits, or financial reporting reviews.
Research with Recall
Analyze scattered documents—think whitepapers, case files, or filings—and bring structure to unstructured data over many sessions.
Persistent Digital Collaborators
Build tools that remember what happened last week, summarize what’s changed, and help teams stay aligned across long-term projects.
Crafting Long-Form Content with Precision
Write whitepapers, detailed documentation, or thoughtful strategy memos with coherence and fluency across several pages.

Claude Sonnet 4 — Fast, Reliable Thinking for Daily Ops and Scalable Workflows

Claude Sonnet 4 is built for high-speed, high-volume tasks—ideal for businesses that need clarity, consistency, and responsiveness at scale. It delivers strong reasoning and crisp output without sacrificing speed, making it a perfect fit for real-time interactions and workflow automation. Whether you’re building systems that need to respond instantly to users or engines that process large volumes of content, Sonnet 4 is tuned for performance under pressure. It’s efficient, scalable, and ready to plug into fast-moving operations—whether in customer service, dev teams, or enterprise strategy.

Where Claude Sonnet 4 Excels:

Real-Time Digital Support
Power chat-based customer experiences, onboarding flows, or internal tools that deliver quick, reliable answers every time.
Agile Development Help
Speed up code reviews, squash bugs, and wire up APIs with near-instant responses and accurate suggestions.
Rapid Insights & Analysis
Scan through dashboards, trends, or competitor reports and get distilled summaries that save hours of manual digging.
Mass Content Workflows
Create, format, and analyze everything from campaign assets to survey responses—at scale, without sacrificing quality.

Step-by-Step Process to Install Anthropic Claude Code Locally

For the purpose of this tutorial, we will use a GPU-powered Virtual Machine offered by NodeShift; however, you can replicate the same steps with any other cloud provider of your choice. NodeShift provides the most affordable Virtual Machines at a scale that meets GDPR, SOC2, and ISO27001 requirements.

Step 1: Sign Up and Set Up a NodeShift Cloud Account

Visit the NodeShift Platform and create an account. Once you’ve signed up, log into your account.

Follow the account setup process and provide the necessary details and information.

Step 2: Create a GPU Node (Virtual Machine)

GPU Nodes are NodeShift’s GPU Virtual Machines, on-demand resources equipped with diverse GPUs ranging from H100s to A100s. These GPU-powered VMs provide enhanced environmental control, allowing configuration adjustments for GPUs, CPUs, RAM, and Storage based on specific requirements.

Navigate to the menu on the left side. Select the GPU Nodes option, create a GPU Node in the Dashboard, click the Create GPU Node button, and create your first Virtual Machine deploy

Step 3: Select a Model, Region, and Storage

In the “GPU Nodes” tab, select a GPU Model and Storage according to your needs and the geographical region where you want to launch your model.

We will use 1 x RTXA6000 GPU for this tutorial to achieve the fastest performance. However, you can choose a more affordable GPU with less VRAM if that better suits your requirements.

Step 4: Select Authentication Method

There are two authentication methods available: Password and SSH Key. SSH keys are a more secure option. To create them, please refer to our official documentation.

Step 5: Choose an Image

Next, you will need to choose an image for your Virtual Machine. We will deploy Claude Code on an NVIDIA Cuda Virtual Machine. This proprietary, closed-source parallel computing platform will allow you to install Claude Code on your GPU Node.

After choosing the image, click the ‘Create’ button, and your Virtual Machine will be deployed.

Step 6: Virtual Machine Successfully Deployed

You will get visual confirmation that your node is up and running.

Step 7: Connect to GPUs using SSH

NodeShift GPUs can be connected to and controlled through a terminal using the SSH key provided during GPU creation.

Once your GPU Node deployment is successfully created and has reached the ‘RUNNING’ status, you can navigate to the page of your GPU Deployment Instance. Then, click the ‘Connect’ button in the top right corner.

Now open your terminal and paste the proxy SSH IP or direct SSH IP.

Step 8: Install Node.Js

Run the following command to install Node.js:

curl -fsSL https://deb.nodesource.com/setup_20.x | sudo -E bash -
sudo apt install -y nodejs

Step 9: Confirm Installation

Run the following command to confirm installation:

node -v
npm -v

You should see versions like:

v20.12.2
10.x.x

Step 10: Install Claude Code

Run the following command to install claude code:

npm install -g @anthropic-ai/claude-code

Step 11: Launch It in Terminal

Run the following command to launch the claude code:

claude

Step 12: Connect to your GPU VM using Remote SSH

Open VS Code on your Mac.
Press Cmd + Shift + P, then choose Remote-SSH: Connect to Host.
Select your configured host (claude-vm).
Once connected, you’ll see SSH: 116.127.115.18 in the bottom-left status bar (like in the image).

Step 13: Claude Code Initial Launch in VS Code Terminal

Run Claude Code from the terminal in VS Code.

Execute the following command to run Claude Code from the terminal in VS Code:

claude

This will launch the Claude Code interface.
You’ll be prompted to select your preferred terminal theme.
Pick 1. Dark mode (recommended for most devs).

Step 14: Claude Code Welcome Banner

Claude prints a large welcome message.
It confirms that you’ve launched Claude Code in your terminal.
This indicates you’re running on a fresh install or after /terminal-setup.

Step 15: Choose Login Method

Authenticate Claude Code usage

Claude now supports two authentication methods:

Anthropic Console (API key billing)
Claude app login (for Max subscription users)

Choose the one that matches your access. If you’re using Claude for free via Max, go with 2.

Step 16: Login Successful

Authenticate and connect Claude Code with your account

You’ve logged in.
This screen confirms successful login to the Claude service.
Press Enter to continue setup.

Step 17: Claude Code IDE Integration + Startup Confirmation

Claude is now fully embedded in VS Code

This screen confirms:

The Claude Code VS Code extension is live (v1.0.2)
You can:
- Press Cmd + Esc to launch Claude Code input bar
- Apply file diffs right in the editor
- Use Ctrl + Alt + K to insert file references

You’ve now completed terminal setup + IDE connection!

What You Can Do from Here

Use Claude as your pair programmer. Try:

/init                            # Initializes CLAUDE.md config
claude -p "Write a unit test for login.js"
claude -p "Summarize the purpose of this repo"
claude -p "Optimize this loop using Python best practices"

Conclusion

You’re all set. Claude Code is live, running smoothly on your GPU VM, and ready to dive deep into your projects. Whether it’s writing, reviewing, or refactoring code—this setup helps you stay in flow and ship faster.

Just open your terminal or VS Code and run:

claude

Relevant blog posts

July 3, 2025

How to Install ERNIE-4.5-VL-28B-A3B-PT Locally?

ERNIE-4.5-VL-28B-A3B is a large-scale vision-language model crafted to understand and reason across both text and images. With 28 billion total parameters and 3 billion activated per token, it combines high efficiency with strong multimodal capabilities. What sets it apart is its thoughtful mixture-of-experts design. By routing inputs through specialized pathways for text and vision, the model delivers accurate, context-aware responses — whether you’re analyzing an image, generating descriptions, or solving reasoning tasks that require both visual and textual understanding. Optimized during post-training using techniques like RLVR (Reinforcement Learning with Verifiable Rewards), this model offers two modes: thinking and non-thinking. You can control how deeply the model reasons based on the task — from lightweight visual description to detailed interpretation. It runs best on high-end GPUs and is deployable via FastDeploy or Jupyter environments.

July 2, 2025

How to Install ERNIE-4.5-21B-A3B-PT Locally?

ERNIE-4.5-21B-A3B is a finely engineered language model that leverages a modular structure with expert routing, designed to deliver high-quality responses efficiently. With 21 billion total parameters and 3 billion activated per input token, this model belongs to the MoE (Mixture-of-Experts) family, ensuring resource-friendly yet powerful generation. It isn’t just large—it’s smart. It handles long-form content, understands context at scale, and operates with a mix of language and vision expertise under the hood. Thanks to its high context length (up to 131,072 tokens) and post-training optimizations, it’s ready for instruction following, dialog, reasoning, and more. Backed by Baidu’s ERNIEKit toolkit and deployed efficiently via FastDeploy or vLLM, this model strikes a balance between performance and practical deployment. Whether you’re fine-tuning, scaling across GPUs, or deploying on high-throughput inference platforms, ERNIE-4.5-21B-A3B offers flexibility and precision out of the box.

June 30, 2025

How to Install ByteDance Dolphin Locally?

Dolphin is a powerful tool that reads and understands document images — whether it’s a scanned PDF, a handwritten formula, or a complex layout with tables and figures. It works in two smart steps: first, it analyzes the full structure of the page (like how we read top to bottom, left to right), then it breaks down each element (like a paragraph or equation) and makes sense of it in parallel. What makes Dolphin stand out is how lightweight and fast it is, while still handling all the messy, real-world formats we throw at it — making it perfect for researchers, developers, and document-heavy workflows.

See all posts

Ready to build
with us?

The ideal way for organizations young and old to ease their way into the distributed and affordable cloud at their own pace.

Stay Tuned!

Stay up to date with the latest updates, news, and hotfixes for our product.

NodeShift creates a vital link between developers and affordable cloud.

Switch theme

English (EN)
Arabic (AR)
Chinese (ZH-CN)
German (DE)
Korean (KO)
Russian (RU)
French (FR)
Spanish (ES)
Portuguese (PT)
Japanese (JA)

JavaScript is disabled in your browser. For a better experience, please enable JavaScript.Learn how to enable JavaScript.