Devstral-Small-2507 is a specialized software engineering model designed to act like a coding assistant that really understands developer needs. Built through a collaboration between Mistral AI and All Hands AI, it’s tailored for tasks like exploring large codebases, editing multiple files, and powering agent-based coding workflows.
With a whopping 128k token context window, it can handle complex projects and long tasks without losing track. Even better, it’s lightweight enough to run on a high-end PC or Mac, and when paired with OpenHands, it can automate engineering tasks, understand prompts across 24 languages, and deliver cutting-edge performance — currently topping the SWE-Bench leaderboard.
Whether you’re building code agents, running automated edits, or just want a next-gen helper for your software projects, Devstral-Small-2507 is a versatile tool designed to keep up with you.
Benchmark Results
SWE-Bench
Devstral Small 1.1 achieves a score of 53.6% on SWE-Bench Verified, outperforming Devstral Small 1.0 by +6,8% and the second best state of the art model by +11.4%.
Model | Agentic Scaffold | SWE-Bench Verified (%) |
---|
Devstral Small 1.1 | OpenHands Scaffold | 53.6 |
Devstral Small 1.0 | OpenHands Scaffold | 46.8 |
GPT-4.1-mini | OpenAI Scaffold | 23.6 |
Claude 3.5 Haiku | Anthropic Scaffold | 40.6 |
SWE-smith-LM 32B | SWE-agent Scaffold | 40.2 |
Skywork SWE | OpenHands Scaffold | 38.0 |
DeepSWE | R2E-Gym Scaffold | 42.2 |
Recommended GPU Configuration Table
GPU Model | VRAM (GB) | CUDA Version | Usage Notes |
---|
RTX 4090 | 24 | 12.1–12.6 | Best balance for local use, strong performance |
A100 (40GB or 80GB) | 40–80 | 12.1–12.6 | Ideal for production and heavy workloads |
H100 SXM | 80 | 12.4–12.6 | Extreme-scale parallelism, fastest inference |
Mac M2 Max (32GB) | 32 (Unified) | Metal backend | Works via local setups like LM Studio |
Resources
Link: https://huggingface.co/mistralai/Devstral-Small-2507
Step-by-Step Process to Install Devstral Small 1.1 Locally
For the purpose of this tutorial, we will use a GPU-powered Virtual Machine offered by NodeShift; however, you can replicate the same steps with any other cloud provider of your choice. NodeShift provides the most affordable Virtual Machines at a scale that meets GDPR, SOC2, and ISO27001 requirements.
Step 1: Sign Up and Set Up a NodeShift Cloud Account
Visit the NodeShift Platform and create an account. Once you’ve signed up, log into your account.
Follow the account setup process and provide the necessary details and information.
Step 2: Create a GPU Node (Virtual Machine)
GPU Nodes are NodeShift’s GPU Virtual Machines, on-demand resources equipped with diverse GPUs ranging from H100s to A100s. These GPU-powered VMs provide enhanced environmental control, allowing configuration adjustments for GPUs, CPUs, RAM, and Storage based on specific requirements.
Navigate to the menu on the left side. Select the GPU Nodes option, create a GPU Node in the Dashboard, click the Create GPU Node button, and create your first Virtual Machine deploy
Step 3: Select a Model, Region, and Storage
In the “GPU Nodes” tab, select a GPU Model and Storage according to your needs and the geographical region where you want to launch your model.
We will use 1 x H100 SXM GPU for this tutorial to achieve the fastest performance. However, you can choose a more affordable GPU with less VRAM if that better suits your requirements.
Step 4: Select Authentication Method
There are two authentication methods available: Password and SSH Key. SSH keys are a more secure option. To create them, please refer to our official documentation.
Step 5: Choose an Image
In our previous blogs, we used pre-built images from the Templates tab when creating a Virtual Machine. However, for running Devstral Small 1.1, we need a more customized environment with full CUDA development capabilities. That’s why, in this case, we switched to the Custom Image tab and selected a specific Docker image that meets all runtime and compatibility requirements.
We chose the following image:
nvidia/cuda:12.1.1-devel-ubuntu22.04
This image is essential because it includes:
- Full CUDA toolkit (including
nvcc
)
- Proper support for building and running GPU-based applications like Devstral Small 1.1
- Compatibility with CUDA 12.1.1 required by certain model operations
Launch Mode
We selected:
Interactive shell server
This gives us SSH access and full control over terminal operations — perfect for installing dependencies, running benchmarks, and launching tools like Devstral Small 1.1.
Docker Repository Authentication
We left all fields empty here.
Since the Docker image is publicly available on Docker Hub, no login credentials are required.
Identification
nvidia/cuda:12.1.1-devel-ubuntu22.04
CUDA and cuDNN images from gitlab.com/nvidia/cuda. Devel version contains full cuda toolkit with nvcc.
This setup ensures that the Devstral Small 1.1 runs in a GPU-enabled environment with proper CUDA access and high compute performance.
After choosing the image, click the ‘Create’ button, and your Virtual Machine will be deployed.
Step 6: Virtual Machine Successfully Deployed
You will get visual confirmation that your node is up and running.
Step 7: Connect to GPUs using SSH
NodeShift GPUs can be connected to and controlled through a terminal using the SSH key provided during GPU creation.
Once your GPU Node deployment is successfully created and has reached the ‘RUNNING’ status, you can navigate to the page of your GPU Deployment Instance. Then, click the ‘Connect’ button in the top right corner.
Now open your terminal and paste the proxy SSH IP or direct SSH IP.
Next, If you want to check the GPU details, run the command below:
nvidia-smi
Step 8: Install Miniconda & Packages
After completing the steps above, install Miniconda.
Miniconda is a free minimal installer for conda. It allows the management and installation of Python packages.
Anaconda has over 1,500 pre-installed packages, making it a comprehensive solution for data science projects. On the other hand, Miniconda allows you to install only the packages you need, reducing unnecessary clutter in your environment.
We highly recommend installing Python using Miniconda. Miniconda comes with Python and a small number of essential packages. Additional packages can be installed using the package management systems Mamba or Conda.
For Linux/macOS:
Download the Miniconda installer script:
sudo apt update && apt install wget -y
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
For Windows:
- Download the Windows Miniconda installer from the official website.
- Run the installer and follow the installation prompts
Run the installer script:
bash Miniconda3-latest-Linux-x86_64.sh
After Installing Miniconda, you will see the following message:
Thank you for installing Miniconda 3! This means Miniconda is installed in your working directory or on your operating system.
Check the screenshot below for proof:
Step 9: Activate Conda and Create a Environment
After the installation process, activate Conda using the following command:
export PATH="/root/miniconda3/bin:$PATH"
conda init
exec "$SHELL"
Create a Conda Environment using the following command:
conda create -n devstral python=3.11 -y
conda activate devstral
conda create
: This is the command to create a new environment.
-n
devstral: The -n
flag specifies the name of the environment you want to create. Here devstral
is the name of the environment you’re creating. You can name it anything you like.
python=3.11
: This specifies the version of Python that you want to install in the new environment. In this case, it’s Python 3.11.
-y
: This flag automatically answers “yes” to all prompts during the creation process, so the environment is created without asking for further confirmation.
Step 10: Install Dependencies
Run the following command to install dependencies:
pip install torch
pip install git+https://github.com/huggingface/transformers
pip install git+https://github.com/huggingface/accelerate
pip install huggingface_hub
pip install --upgrade vllm
pip install --upgrade mistral_common chal
Step 11: Authenticate with Hugging Face
Now that the dependencies are installed, authenticate with your Hugging Face account to access model files and resources.
Run the following command:
huggingface-cli login
This will prompt you to enter your Hugging Face token.
✅ Go to https://huggingface.co/settings/tokens,
✅ copy your token,
✅ paste it in the terminal when asked.
Step 12: Install Compatible Transformers & Tokenizers
To ensure full compatibility between vLLM and Devstral Small 1.1, install the required versions of the transformers
and tokenizers
libraries:
pip install transformers==4.51.1 tokenizers==0.21.1
This pins:
transformers
to 4.51.1 (required by vLLM 0.9.2)
tokenizers
to 0.21.1 (matches transformers and vLLM expectations)
Step 13: Verify Installed Versions
Before launching the server, check that all critical libraries are installed in the correct versions.
Run these commands:
python -c "import transformers; print(transformers.__version__)"
python -c "import tokenizers; print(tokenizers.__version__)"
python -c "import vllm; print(vllm.__version__)"
python -c "import mistral_common; print(mistral_common.__version__)"
You should see:
4.51.1 # transformers
0.21.1 # tokenizers
0.9.2 # vllm
1.7.0 # mistral_common
Step 14: Launch the vLLM Server
Now, start the Devstral Small 1.1 model server using vLLM by running:
vllm serve mistralai/Devstral-Small-2507 \
--tokenizer_mode mistral \
--config_format mistral \
--load_format mistral \
--tool-call-parser mistral \
--enable-auto-tool-choice \
--tensor-parallel-size 1
This command will:
- Start the vLLM API server on port
8000
- Make the model available through OpenAI-compatible routes like:
/v1/chat/completions
/v1/completions
/v1/embeddings
Final confirmation you will see:
INFO: Started server process [xxxx]
INFO: Waiting for application startup.
INFO: Application startup complete.
Step 15: Install Gradio and Requests
To run a local web demo for Devstral Small 1.1, install the required Python packages:
pip install gradio requests
This installs:
- Gradio → for building an interactive web UI
- Requests → for sending HTTP requests to your running vLLM server
Step 16: Connect to your GPU VM using Remote SSH
- Open VS Code on your Mac.
- Press
Cmd + Shift + P
, then choose Remote-SSH: Connect to Host
.
- Select your configured host.
- Once connected, you’ll see
SSH: 38.29.145.28
(Your VM IP) in the bottom-left status bar (like in the image).
Step 17: Write the Gradio Demo Script
In this step, you create the Python script that will run a local Gradio app to connect with Devstral Small 1.1.
Create a new file named:
devstral_demo.py
Paste the following code into it:
import gradio as gr
import requests
# Set your vLLM server URL
VLLM_SERVER_URL = "http://localhost:8000/v1/completions" # change if needed!
def chat_with_devstral(prompt, temperature=0.2, max_tokens=1024):
headers = {"Content-Type": "application/json", "Authorization": f"Bearer token"}
payload = {
"model": "mistralai/Devstral-Small-2507",
"prompt": prompt,
"max_tokens": max_tokens,
"temperature": temperature,
}
response = requests.post(VLLM_SERVER_URL, headers=headers, json=payload)
if response.status_code == 200:
result = response.json()
if "choices" in result and len(result["choices"]) > 0:
return result["choices"][0]["text"]
else:
return "⚠️ No response received."
else:
return f"❌ Error: {response.status_code}, {response.text}"
# Gradio interface
gr.Interface(
fn=chat_with_devstral,
inputs=[
gr.Textbox(lines=4, placeholder="Enter your software engineering prompt here..."),
gr.Slider(0, 1, value=0.2, label="Temperature"),
gr.Slider(128, 4096, value=1024, step=128, label="Max Tokens"),
],
outputs="text",
title="💻 Devstral-Small-2507 Software Engineer Agent",
description="Chat with Mistral's Devstral-Small-2507 model running locally via vLLM!"
).launch(server_name="0.0.0.0", server_port=7860)
Step 18: Run the Gradio Demo Script and Set Up Port Forwarding
Run the Gradio script on the VM
On your VM terminal, launch the demo:
python3 devstral_demo.py
You should see:
* Running on local URL: http://0.0.0.0:7860
* To create a public link, set `share=True` in `launch()`.
This means the Gradio server is running on port 7860 inside the VM.
Set up SSH port forwarding from your local machine
On your local machine (Mac/Windows/Linux), open a terminal and run:
ssh -p 19369 -L 7860:127.0.0.1:7860 root@80.188.223.202
This forwards:
- Local
localhost:7860
→ Remote VM 127.0.0.1:7860
Step 19: Open the Gradio Web Interface
After you’ve forwarded the port and launched the script, open your browser and go to:
http://localhost:7860
You should see the Gradio web UI titled:
💻 Devstral-Small-2507 Software Engineer Agent
This is your interactive playground to chat with the Devstral-Small-2507 model.
Step 20: Try Example Prompts
Here are some cool test prompts to start with:
Prompt 1: Write FastAPI Endpoint
Write a minimal Python FastAPI app that exposes one endpoint /greet which takes a 'name' as query parameter and returns a JSON greeting message.
Expected output:
from fastapi import FastAPI
app = FastAPI()
@app.get("/greet")
def greet(name: str):
return {"message": f"Hello, {name}!"}
Step 21: Create a Python Script for Direct API Testing
Instead of using the Gradio UI, you can also directly send requests to the vLLM server using a Python script.
Create a new file named:
app.py
Add the following code:
import requests
import json
url = "http://127.0.0.1:8000/v1/chat/completions"
headers = {"Content-Type": "application/json"}
model = "mistralai/Devstral-Small-2507"
SYSTEM_PROMPT = "You are Devstral, an expert software engineer."
messages = [
{"role": "system", "content": SYSTEM_PROMPT},
{"role": "user", "content": "Refactor this Python function to improve readability:\ndef foo(x): return [i*2 for i in x if i%2==0]"},
]
data = {"model": model, "messages": messages, "temperature": 0.15}
response = requests.post(url, headers=headers, data=json.dumps(data))
print(response.json()["choices"][0]["message"]["content"])
How to run it
- Make sure your vLLM server is running.
- SSH into the VM (if not already).
- Run:
python3 app.py
It will print the Devstral-generated response directly in the terminal.
Why this step?
This gives you:
- A code-only way to interact with the model
- No need for Gradio or web UI
- Easier automation or scripting for advanced tasks
Conclusion
Devstral-Small-2507 is more than just another code tool — it’s a practical companion for developers looking to speed up software tasks, improve code quality, and explore new workflows. Whether you’re setting it up locally on a powerful GPU or connecting it through the cloud, the process is straightforward and flexible.
With its impressive benchmark performance, wide language support, and ability to plug into tools like OpenHands and Gradio, Devstral is a solid choice for anyone who wants a coding assistant that can handle real-world engineering work. Once you have it up and running, you’ll find it ready to help — from writing clean code and editing files to analyzing projects and automating tricky tasks.
If you’re ready to bring Devstral into your workflow, follow the steps, experiment with prompts, and make it your own.