Today, Anthropic unveiled Claude Opus 4 and Claude Sonnet 4, redefining what’s possible in software engineering, coding precision, and tool-based thinking. Claude Opus 4 stands out as the most advanced model for developers, consistently delivering top-tier results on long, uninterrupted workflows. With a commanding 72.5% on SWE-bench and 43.2% on Terminal-bench, it handles hours-long, multi-step challenges with a level of consistency that was previously out of reach. Claude Sonnet 4, meanwhile, offers a well-balanced upgrade from 3.7, achieving a standout 72.7% SWE-bench score and offering sharper reasoning, better code navigation, and more accurate responses across coding scenarios.
These models aren’t just faster—they’re smarter, more focused, and more practical in real-world applications. Developers can now pair these tools with VS Code and JetBrains for seamless background execution, GitHub integrations, and native code suggestions. With parallel tool execution, precise instruction following, and long-term memory through file-based context, Claude 4 models introduce a powerful shift in how people build and reason through technical problems.
Resource
GitHub
Link: https://github.com/anthropics/claude-code
Claude 4 models deliver strong performance across coding, reasoning, multimodal capabilities, and agentic tasks. See appendix for more on methodology.
Benchmark | Claude Opus 4 | Claude Sonnet 4 | Claude Sonnet 3.7 | OpenAI o3 | OpenAI GPT-4.1 | Gemini 2.5 Pro (Preview 05-06) |
---|
Agentic coding (SWE-bench Verified) | 72.5% / 79.4% | 72.7% / 80.2% | 62.3% / 70.3% | 69.1% | 54.6% | 63.2% |
Agentic terminal coding (Terminal-bench) | 43.2% / 50.0% | 35.5% / 41.3% | 35.2% | 30.2% | 30.3% | 25.3% |
Graduate-level reasoning (GPQA Diamond) | 79.6% / 83.3% | 75.4% / 83.8% | 78.2% | 83.3% | 66.3% | 83.0% |
Agentic tool use (TAU-bench Retail) | 81.4% | 80.5% | 81.2% | 70.4% | 68.0% | — |
Agentic tool use (TAU-bench Airline) | 59.6% | 60.0% | 58.4% | 52.0% | 49.4% | — |
Multilingual Q&A (MMLU) | 88.8% | 86.5% | 85.9% | 88.8% | 83.7% | — |
Visual reasoning (MMMU validation) | 76.5% | 74.4% | 75.0% | 82.9% | 74.8% | 79.6% |
High school math competition (AIME 2025) | 75.5% / 90.0% | 70.5% / 85.0% | 54.8% | 88.9% | — | 83.0% |
Claude 4 models lead on SWE-bench Verified, a benchmark for performance on real software engineering tasks. See appendix for more on methodology.
Model | Accuracy (Base) | Accuracy (with parallel test-time compute) |
---|
Claude Opus 4 | 72.5% | 79.4% |
Claude Sonnet 4 | 72.7% | 80.2% |
Claude Sonnet 3.7 | 62.3% | 70.3% |
OpenAI Codex-1 | 72.1% | — |
OpenAI o3 | 69.1% | — |
OpenAI GPT-4.1 | 54.6% | — |
Gemini 2.5 Pro | 63.2% | — |
How Claude 4 Sets New Standards in Performance Benchmarks
The performance results shared for Claude Opus 4 and Claude Sonnet 4 reflect a rigorous and transparent evaluation process designed to mirror real-world usage. Both models were tested across a blend of immediate-response tasks and extended thinking challenges involving deeper reasoning over longer contexts—up to 64,000 tokens. For coding-specific benchmarks like SWE-bench Verified and Terminal-bench, the models worked without extended thinking, operating under tightly scoped single-attempt conditions with two core tools: a bash shell and a string-based file editor. Claude 4 models set new highs in these tasks using only 500 problems, while OpenAI’s scores reflect a slightly smaller 477-task subset.
For extended thinking benchmarks—like GPQA Diamond, TAU-bench, MMMLU, and AIME—performance surged when the models were encouraged to reason step-by-step using tool feedback and parallel workflows. Notably, TAU-bench scores were gathered with longer sequences and additional step capacity, allowing the models to better plan, reason, and refine their outputs through iterative completions. For high-compute results, multiple completions were sampled, regression-breaking patches were filtered out, and the most effective responses were selected through internal review—leading to peak scores of 79.4% for Opus 4 and 80.2% for Sonnet 4. These scores don’t just represent raw accuracy—they reflect a shift in how complex software and reasoning tasks are approached at scale.
Claude Opus 4 — Built for Depth, Focus, and Endurance
Claude Opus 4 represents a major leap forward in building digital systems that can handle deep, uninterrupted thinking. Designed for complex, high-stakes work, it excels at tasks that demand multiple steps, structured logic, and long attention spans. Whether it’s a seven-hour engineering workflow, a legal audit across thousands of documents, or building systems that need to remember and evolve over time—Opus 4 stays locked in, delivering results with clarity, structure, and stamina. It’s not just fast; it’s deliberate, organized, and capable of picking up where it left off. With built-in memory capabilities and precision reasoning, Opus 4 unlocks new workflows where sustained effort matters.
Where Claude Opus 4 Shines:
- Large-Scale Development Tasks
Refactor complex codebases, migrate architectures, or build out full-stack systems from scratch with reliable flow and structure.
- Process Automation for Knowledge Work
Set up digital workflows to handle multi-step processes like legal research, compliance audits, or financial reporting reviews.
- Research with Recall
Analyze scattered documents—think whitepapers, case files, or filings—and bring structure to unstructured data over many sessions.
- Persistent Digital Collaborators
Build tools that remember what happened last week, summarize what’s changed, and help teams stay aligned across long-term projects.
- Crafting Long-Form Content with Precision
Write whitepapers, detailed documentation, or thoughtful strategy memos with coherence and fluency across several pages.
Claude Sonnet 4 — Fast, Reliable Thinking for Daily Ops and Scalable Workflows
Claude Sonnet 4 is built for high-speed, high-volume tasks—ideal for businesses that need clarity, consistency, and responsiveness at scale. It delivers strong reasoning and crisp output without sacrificing speed, making it a perfect fit for real-time interactions and workflow automation. Whether you’re building systems that need to respond instantly to users or engines that process large volumes of content, Sonnet 4 is tuned for performance under pressure. It’s efficient, scalable, and ready to plug into fast-moving operations—whether in customer service, dev teams, or enterprise strategy.
Where Claude Sonnet 4 Excels:
- Real-Time Digital Support
Power chat-based customer experiences, onboarding flows, or internal tools that deliver quick, reliable answers every time.
- Agile Development Help
Speed up code reviews, squash bugs, and wire up APIs with near-instant responses and accurate suggestions.
- Rapid Insights & Analysis
Scan through dashboards, trends, or competitor reports and get distilled summaries that save hours of manual digging.
- Mass Content Workflows
Create, format, and analyze everything from campaign assets to survey responses—at scale, without sacrificing quality.
Step-by-Step Process to Install Anthropic Claude Code Locally
For the purpose of this tutorial, we will use a GPU-powered Virtual Machine offered by NodeShift; however, you can replicate the same steps with any other cloud provider of your choice. NodeShift provides the most affordable Virtual Machines at a scale that meets GDPR, SOC2, and ISO27001 requirements.
Step 1: Sign Up and Set Up a NodeShift Cloud Account
Visit the NodeShift Platform and create an account. Once you’ve signed up, log into your account.
Follow the account setup process and provide the necessary details and information.
Step 2: Create a GPU Node (Virtual Machine)
GPU Nodes are NodeShift’s GPU Virtual Machines, on-demand resources equipped with diverse GPUs ranging from H100s to A100s. These GPU-powered VMs provide enhanced environmental control, allowing configuration adjustments for GPUs, CPUs, RAM, and Storage based on specific requirements.
Navigate to the menu on the left side. Select the GPU Nodes option, create a GPU Node in the Dashboard, click the Create GPU Node button, and create your first Virtual Machine deploy
Step 3: Select a Model, Region, and Storage
In the “GPU Nodes” tab, select a GPU Model and Storage according to your needs and the geographical region where you want to launch your model.
We will use 1 x RTXA6000 GPU for this tutorial to achieve the fastest performance. However, you can choose a more affordable GPU with less VRAM if that better suits your requirements.
Step 4: Select Authentication Method
There are two authentication methods available: Password and SSH Key. SSH keys are a more secure option. To create them, please refer to our official documentation.
Step 5: Choose an Image
Next, you will need to choose an image for your Virtual Machine. We will deploy Claude Code on an NVIDIA Cuda Virtual Machine. This proprietary, closed-source parallel computing platform will allow you to install Claude Code on your GPU Node.
After choosing the image, click the ‘Create’ button, and your Virtual Machine will be deployed.
Step 6: Virtual Machine Successfully Deployed
You will get visual confirmation that your node is up and running.
Step 7: Connect to GPUs using SSH
NodeShift GPUs can be connected to and controlled through a terminal using the SSH key provided during GPU creation.
Once your GPU Node deployment is successfully created and has reached the ‘RUNNING’ status, you can navigate to the page of your GPU Deployment Instance. Then, click the ‘Connect’ button in the top right corner.
Now open your terminal and paste the proxy SSH IP or direct SSH IP.
Step 8: Install Node.Js
Run the following command to install Node.js:
curl -fsSL https://deb.nodesource.com/setup_20.x | sudo -E bash -
sudo apt install -y nodejs
Step 9: Confirm Installation
Run the following command to confirm installation:
node -v
npm -v
You should see versions like:
v20.12.2
10.x.x
Step 10: Install Claude Code
Run the following command to install claude code:
npm install -g @anthropic-ai/claude-code
Step 11: Launch It in Terminal
Run the following command to launch the claude code:
claude
Step 12: Connect to your GPU VM using Remote SSH
- Open VS Code on your Mac.
- Press
Cmd + Shift + P
, then choose Remote-SSH: Connect to Host
.
- Select your configured host (
claude-vm
).
- Once connected, you’ll see
SSH: 116.127.115.18
in the bottom-left status bar (like in the image).
Step 13: Claude Code Initial Launch in VS Code Terminal
Run Claude Code from the terminal in VS Code.
Execute the following command to run Claude Code from the terminal in VS Code:
claude
- This will launch the Claude Code interface.
- You’ll be prompted to select your preferred terminal theme.
- Pick
1. Dark mode
(recommended for most devs).
Step 14: Claude Code Welcome Banner
- Claude prints a large welcome message.
- It confirms that you’ve launched Claude Code in your terminal.
- This indicates you’re running on a fresh install or after
/terminal-setup
.
Step 15: Choose Login Method
Authenticate Claude Code usage
Claude now supports two authentication methods:
- Anthropic Console (API key billing)
- Claude app login (for Max subscription users)
Choose the one that matches your access. If you’re using Claude for free via Max, go with 2
.
Step 16: Login Successful
Authenticate and connect Claude Code with your account
- You’ve logged in.
- This screen confirms successful login to the Claude service.
- Press
Enter
to continue setup.
Step 17: Claude Code IDE Integration + Startup Confirmation
Claude is now fully embedded in VS Code
This screen confirms:
- The Claude Code VS Code extension is live (
v1.0.2
)
- You can:
- Press
Cmd + Esc
to launch Claude Code input bar
- Apply file diffs right in the editor
- Use
Ctrl + Alt + K
to insert file references
You’ve now completed terminal setup + IDE connection!
What You Can Do from Here
Use Claude as your pair programmer. Try:
/init # Initializes CLAUDE.md config
claude -p "Write a unit test for login.js"
claude -p "Summarize the purpose of this repo"
claude -p "Optimize this loop using Python best practices"
Conclusion
You’re all set. Claude Code is live, running smoothly on your GPU VM, and ready to dive deep into your projects. Whether it’s writing, reviewing, or refactoring code—this setup helps you stay in flow and ship faster.
Just open your terminal or VS Code and run:
claude