How to Install Devstral Small 1.1 Locally?

by Ayush Kumar | July 11, 2025

Ready to build cheaper?

Custom CPU plans from as little as $0.012/hour.

Devstral-Small-2507 is a specialized software engineering model designed to act like a coding assistant that really understands developer needs. Built through a collaboration between Mistral AI and All Hands AI, it’s tailored for tasks like exploring large codebases, editing multiple files, and powering agent-based coding workflows.

With a whopping 128k token context window, it can handle complex projects and long tasks without losing track. Even better, it’s lightweight enough to run on a high-end PC or Mac, and when paired with OpenHands, it can automate engineering tasks, understand prompts across 24 languages, and deliver cutting-edge performance — currently topping the SWE-Bench leaderboard.

Whether you’re building code agents, running automated edits, or just want a next-gen helper for your software projects, Devstral-Small-2507 is a versatile tool designed to keep up with you.

Benchmark Results

SWE-Bench

Devstral Small 1.1 achieves a score of 53.6% on SWE-Bench Verified, outperforming Devstral Small 1.0 by +6,8% and the second best state of the art model by +11.4%.

Model	Agentic Scaffold	SWE-Bench Verified (%)
Devstral Small 1.1	OpenHands Scaffold	53.6
Devstral Small 1.0	OpenHands Scaffold	46.8
GPT-4.1-mini	OpenAI Scaffold	23.6
Claude 3.5 Haiku	Anthropic Scaffold	40.6
SWE-smith-LM 32B	SWE-agent Scaffold	40.2
Skywork SWE	OpenHands Scaffold	38.0
DeepSWE	R2E-Gym Scaffold	42.2

Recommended GPU Configuration Table

GPU Model	VRAM (GB)	CUDA Version	Usage Notes
RTX 4090	24	12.1–12.6	Best balance for local use, strong performance
A100 (40GB or 80GB)	40–80	12.1–12.6	Ideal for production and heavy workloads
H100 SXM	80	12.4–12.6	Extreme-scale parallelism, fastest inference
Mac M2 Max (32GB)	32 (Unified)	Metal backend	Works via local setups like LM Studio

Resources

Link: https://huggingface.co/mistralai/Devstral-Small-2507

Step-by-Step Process to Install Devstral Small 1.1 Locally

For the purpose of this tutorial, we will use a GPU-powered Virtual Machine offered by NodeShift; however, you can replicate the same steps with any other cloud provider of your choice. NodeShift provides the most affordable Virtual Machines at a scale that meets GDPR, SOC2, and ISO27001 requirements.

Step 1: Sign Up and Set Up a NodeShift Cloud Account

Visit the NodeShift Platform and create an account. Once you’ve signed up, log into your account.

Follow the account setup process and provide the necessary details and information.

Step 2: Create a GPU Node (Virtual Machine)

GPU Nodes are NodeShift’s GPU Virtual Machines, on-demand resources equipped with diverse GPUs ranging from H100s to A100s. These GPU-powered VMs provide enhanced environmental control, allowing configuration adjustments for GPUs, CPUs, RAM, and Storage based on specific requirements.

Navigate to the menu on the left side. Select the GPU Nodes option, create a GPU Node in the Dashboard, click the Create GPU Node button, and create your first Virtual Machine deploy

Step 3: Select a Model, Region, and Storage

In the “GPU Nodes” tab, select a GPU Model and Storage according to your needs and the geographical region where you want to launch your model.

We will use 1 x H100 SXM GPU for this tutorial to achieve the fastest performance. However, you can choose a more affordable GPU with less VRAM if that better suits your requirements.

Step 4: Select Authentication Method

There are two authentication methods available: Password and SSH Key. SSH keys are a more secure option. To create them, please refer to our official documentation.

Step 5: Choose an Image

In our previous blogs, we used pre-built images from the Templates tab when creating a Virtual Machine. However, for running Devstral Small 1.1, we need a more customized environment with full CUDA development capabilities. That’s why, in this case, we switched to the Custom Image tab and selected a specific Docker image that meets all runtime and compatibility requirements.

We chose the following image:

nvidia/cuda:12.1.1-devel-ubuntu22.04

This image is essential because it includes:

Full CUDA toolkit (including nvcc)
Proper support for building and running GPU-based applications like Devstral Small 1.1
Compatibility with CUDA 12.1.1 required by certain model operations

Launch Mode

We selected:

Interactive shell server

This gives us SSH access and full control over terminal operations — perfect for installing dependencies, running benchmarks, and launching tools like Devstral Small 1.1.

Docker Repository Authentication

We left all fields empty here.

Since the Docker image is publicly available on Docker Hub, no login credentials are required.

Identification

Template Name:

nvidia/cuda:12.1.1-devel-ubuntu22.04

CUDA and cuDNN images from gitlab.com/nvidia/cuda. Devel version contains full cuda toolkit with nvcc.

This setup ensures that the Devstral Small 1.1 runs in a GPU-enabled environment with proper CUDA access and high compute performance.

After choosing the image, click the ‘Create’ button, and your Virtual Machine will be deployed.

Step 6: Virtual Machine Successfully Deployed

You will get visual confirmation that your node is up and running.

Step 7: Connect to GPUs using SSH

NodeShift GPUs can be connected to and controlled through a terminal using the SSH key provided during GPU creation.

Once your GPU Node deployment is successfully created and has reached the ‘RUNNING’ status, you can navigate to the page of your GPU Deployment Instance. Then, click the ‘Connect’ button in the top right corner.

Now open your terminal and paste the proxy SSH IP or direct SSH IP.

Next, If you want to check the GPU details, run the command below:

nvidia-smi

Step 8: Install Miniconda & Packages

After completing the steps above, install Miniconda.

Miniconda is a free minimal installer for conda. It allows the management and installation of Python packages.

Anaconda has over 1,500 pre-installed packages, making it a comprehensive solution for data science projects. On the other hand, Miniconda allows you to install only the packages you need, reducing unnecessary clutter in your environment.

We highly recommend installing Python using Miniconda. Miniconda comes with Python and a small number of essential packages. Additional packages can be installed using the package management systems Mamba or Conda.

For Linux/macOS:

Download the Miniconda installer script:

sudo apt update && apt install wget -y
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh

For Windows:

Download the Windows Miniconda installer from the official website.
Run the installer and follow the installation prompts

Run the installer script:

bash Miniconda3-latest-Linux-x86_64.sh

After Installing Miniconda, you will see the following message:

Thank you for installing Miniconda 3! This means Miniconda is installed in your working directory or on your operating system.

Check the screenshot below for proof:

Step 9: Activate Conda and Create a Environment

After the installation process, activate Conda using the following command:

export PATH="/root/miniconda3/bin:$PATH"
conda init
exec "$SHELL"

Create a Conda Environment using the following command:

conda create -n devstral python=3.11 -y
conda activate devstral

conda create: This is the command to create a new environment.
-n devstral: The -n flag specifies the name of the environment you want to create. Here devstral is the name of the environment you’re creating. You can name it anything you like.
python=3.11: This specifies the version of Python that you want to install in the new environment. In this case, it’s Python 3.11.
-y: This flag automatically answers “yes” to all prompts during the creation process, so the environment is created without asking for further confirmation.

Step 10: Install Dependencies

Run the following command to install dependencies:

pip install torch
pip install git+https://github.com/huggingface/transformers
pip install git+https://github.com/huggingface/accelerate
pip install huggingface_hub
pip install --upgrade vllm
pip install --upgrade mistral_common chal

Step 11: Authenticate with Hugging Face

Now that the dependencies are installed, authenticate with your Hugging Face account to access model files and resources.

Run the following command:

huggingface-cli login

This will prompt you to enter your Hugging Face token.

✅ Go to https://huggingface.co/settings/tokens,
✅ copy your token,
✅ paste it in the terminal when asked.

Step 12: Install Compatible Transformers & Tokenizers

To ensure full compatibility between vLLM and Devstral Small 1.1, install the required versions of the transformers and tokenizers libraries:

pip install transformers==4.51.1 tokenizers==0.21.1

This pins:

transformers to 4.51.1 (required by vLLM 0.9.2)
tokenizers to 0.21.1 (matches transformers and vLLM expectations)

Step 13: Verify Installed Versions

Before launching the server, check that all critical libraries are installed in the correct versions.

Run these commands:

python -c "import transformers; print(transformers.__version__)"
python -c "import tokenizers; print(tokenizers.__version__)"
python -c "import vllm; print(vllm.__version__)"
python -c "import mistral_common; print(mistral_common.__version__)"

You should see:

4.51.1  # transformers
0.21.1  # tokenizers
0.9.2   # vllm
1.7.0   # mistral_common

Step 14: Launch the vLLM Server

Now, start the Devstral Small 1.1 model server using vLLM by running:

vllm serve mistralai/Devstral-Small-2507 \
  --tokenizer_mode mistral \
  --config_format mistral \
  --load_format mistral \
  --tool-call-parser mistral \
  --enable-auto-tool-choice \
  --tensor-parallel-size 1

This command will:

Start the vLLM API server on port 8000
Make the model available through OpenAI-compatible routes like:
- /v1/chat/completions
- /v1/completions
- /v1/embeddings

Final confirmation you will see:

INFO: Started server process [xxxx]
INFO: Waiting for application startup.
INFO: Application startup complete.

Step 15: Install Gradio and Requests

To run a local web demo for Devstral Small 1.1, install the required Python packages:

pip install gradio requests

This installs:

Gradio → for building an interactive web UI
Requests → for sending HTTP requests to your running vLLM server

Step 16: Connect to your GPU VM using Remote SSH

Open VS Code on your Mac.
Press Cmd + Shift + P, then choose Remote-SSH: Connect to Host.
Select your configured host.
Once connected, you’ll see SSH: 38.29.145.28(Your VM IP) in the bottom-left status bar (like in the image).

Step 17: Write the Gradio Demo Script

In this step, you create the Python script that will run a local Gradio app to connect with Devstral Small 1.1.

Create a new file named:

devstral_demo.py

Paste the following code into it:

import gradio as gr
import requests

# Set your vLLM server URL
VLLM_SERVER_URL = "http://localhost:8000/v1/completions"  # change if needed!

def chat_with_devstral(prompt, temperature=0.2, max_tokens=1024):
    headers = {"Content-Type": "application/json", "Authorization": f"Bearer token"}
    payload = {
        "model": "mistralai/Devstral-Small-2507",
        "prompt": prompt,
        "max_tokens": max_tokens,
        "temperature": temperature,
    }

    response = requests.post(VLLM_SERVER_URL, headers=headers, json=payload)

    if response.status_code == 200:
        result = response.json()
        if "choices" in result and len(result["choices"]) > 0:
            return result["choices"][0]["text"]
        else:
            return "⚠️ No response received."
    else:
        return f"❌ Error: {response.status_code}, {response.text}"

# Gradio interface
gr.Interface(
    fn=chat_with_devstral,
    inputs=[
        gr.Textbox(lines=4, placeholder="Enter your software engineering prompt here..."),
        gr.Slider(0, 1, value=0.2, label="Temperature"),
        gr.Slider(128, 4096, value=1024, step=128, label="Max Tokens"),
    ],
    outputs="text",
    title="💻 Devstral-Small-2507 Software Engineer Agent",
    description="Chat with Mistral's Devstral-Small-2507 model running locally via vLLM!"
).launch(server_name="0.0.0.0", server_port=7860)

Step 18: Run the Gradio Demo Script and Set Up Port Forwarding

Run the Gradio script on the VM

On your VM terminal, launch the demo:

python3 devstral_demo.py

You should see:

* Running on local URL:  http://0.0.0.0:7860
* To create a public link, set `share=True` in `launch()`.

This means the Gradio server is running on port 7860 inside the VM.

Set up SSH port forwarding from your local machine

On your local machine (Mac/Windows/Linux), open a terminal and run:

ssh -p 19369 -L 7860:127.0.0.1:7860 root@80.188.223.202

This forwards:

Local localhost:7860 → Remote VM 127.0.0.1:7860

Step 19: Open the Gradio Web Interface

After you’ve forwarded the port and launched the script, open your browser and go to:

http://localhost:7860

You should see the Gradio web UI titled:

💻 Devstral-Small-2507 Software Engineer Agent

This is your interactive playground to chat with the Devstral-Small-2507 model.

Step 20: Try Example Prompts

Here are some cool test prompts to start with:

Prompt 1: Write FastAPI Endpoint

Write a minimal Python FastAPI app that exposes one endpoint /greet which takes a 'name' as query parameter and returns a JSON greeting message.

Expected output:

from fastapi import FastAPI

app = FastAPI()

@app.get("/greet")
def greet(name: str):
    return {"message": f"Hello, {name}!"}

Step 21: Create a Python Script for Direct API Testing

Instead of using the Gradio UI, you can also directly send requests to the vLLM server using a Python script.

Create a new file named:

app.py

Add the following code:

import requests
import json

url = "http://127.0.0.1:8000/v1/chat/completions"
headers = {"Content-Type": "application/json"}

model = "mistralai/Devstral-Small-2507"

SYSTEM_PROMPT = "You are Devstral, an expert software engineer."

messages = [
    {"role": "system", "content": SYSTEM_PROMPT},
    {"role": "user", "content": "Refactor this Python function to improve readability:\ndef foo(x): return [i*2 for i in x if i%2==0]"},
]

data = {"model": model, "messages": messages, "temperature": 0.15}

response = requests.post(url, headers=headers, data=json.dumps(data))
print(response.json()["choices"][0]["message"]["content"])

How to run it

Make sure your vLLM server is running.
SSH into the VM (if not already).
Run:

python3 app.py

It will print the Devstral-generated response directly in the terminal.

Why this step?

This gives you:

A code-only way to interact with the model
No need for Gradio or web UI
Easier automation or scripting for advanced tasks

Conclusion

Devstral-Small-2507 is more than just another code tool — it’s a practical companion for developers looking to speed up software tasks, improve code quality, and explore new workflows. Whether you’re setting it up locally on a powerful GPU or connecting it through the cloud, the process is straightforward and flexible.

With its impressive benchmark performance, wide language support, and ability to plug into tools like OpenHands and Gradio, Devstral is a solid choice for anyone who wants a coding assistant that can handle real-world engineering work. Once you have it up and running, you’ll find it ready to help — from writing clean code and editing files to analyzing projects and automating tricky tasks.

If you’re ready to bring Devstral into your workflow, follow the steps, experiment with prompts, and make it your own.

Relevant blog posts

July 12, 2025

Building an AI-Powered Chest X-ray Analyzer with MedGemma 27B and Gradio

MedGemma 27B is a cutting-edge medical language and vision model developed by Google, designed to understand both medical text and images. Built as part of the Gemma 3 family, MedGemma comes in two flavors: a multimodal variant that handles both text and images, and a text-only variant focused purely on medical language tasks. It has been trained using a wide range of de-identified medical data — including chest X-rays, dermatology photos, ophthalmology images, and radiology reports — and shows strong performance in medical reasoning, report generation, and visual question answering. While it offers an exciting baseline, MedGemma is meant as a starting point for developers to fine-tune or adapt into healthcare research projects, not as a plug-and-play clinical tool.

July 9, 2025

How to Install Nari Dia-1.6B-0626 Locally?

Dia is a fully open, 1.6 billion parameter text-to-speech model crafted by the small but mighty team at Nari Labs. Unlike traditional TTS tools, Dia doesn’t just read — it performs. With the ability to switch speakers, express emotions, and even insert non-verbal gestures like (laughs) or (coughs), Dia brings scripts to life with uncanny realism. Plug in a simple transcript — optionally guided by an audio prompt — and Dia generates vivid, back-and-forth conversations. It’s a playground for storytellers, developers, and researchers who want full control over expressive speech without relying on closed platforms. Built on PyTorch, optimized for GPU speed, and backed by Apache 2.0 licensing, Dia is here to empower voice-first experiences with full transparency and community collaboration.

July 3, 2025

How to Install ERNIE-4.5-VL-28B-A3B-PT Locally?

ERNIE-4.5-VL-28B-A3B is a large-scale vision-language model crafted to understand and reason across both text and images. With 28 billion total parameters and 3 billion activated per token, it combines high efficiency with strong multimodal capabilities. What sets it apart is its thoughtful mixture-of-experts design. By routing inputs through specialized pathways for text and vision, the model delivers accurate, context-aware responses — whether you’re analyzing an image, generating descriptions, or solving reasoning tasks that require both visual and textual understanding. Optimized during post-training using techniques like RLVR (Reinforcement Learning with Verifiable Rewards), this model offers two modes: thinking and non-thinking. You can control how deeply the model reasons based on the task — from lightweight visual description to detailed interpretation. It runs best on high-end GPUs and is deployable via FastDeploy or Jupyter environments.

See all posts

Ready to build
with us?

The ideal way for organizations young and old to ease their way into the distributed and affordable cloud at their own pace.

Stay Tuned!

Stay up to date with the latest updates, news, and hotfixes for our product.

NodeShift creates a vital link between developers and affordable cloud.

Switch theme

English (EN)
Arabic (AR)
Chinese (ZH-CN)
German (DE)
Korean (KO)
Russian (RU)
French (FR)
Spanish (ES)
Portuguese (PT)
Japanese (JA)

JavaScript is disabled in your browser. For a better experience, please enable JavaScript.Learn how to enable JavaScript.