Nemotron-Research-Reasoning-Qwen-1.5B is a compact powerhouse built for solving complex reasoning tasks across math, code, science, and logic. Developed by NVIDIA, it’s designed to think through problems the way a sharp student would—step by step, carefully, and with clarity.
From tackling Olympiad-style math puzzles to debugging code and breaking down scientific explanations, this model punches far above its size. It’s the result of deep training across diverse and challenging topics, making it ideal for research, development, and anyone curious about how far small models can go when taught to think smart—not just big.
The leading generalist reasoning model for cutting-edge research and development .
Evaluation Results
Table 1: Performance (pass@1) comparison for benchmarks across Math domain.
Model | AIME24 | AIME25 | AMC | Math | Minerva | Olympiad | Avg |
---|
DeepSeek-R1-Distill-Qwen-1.5B | 28.54 | 22.71 | 62.58 | 82.90 | 26.38 | 43.58 | 44.45 |
DeepScaleR-1.5B | 40.21 | 31.46 | 73.04 | 89.36 | 41.57 | 51.63 | 54.54 |
DeepSeek-R1-Distill-Qwen-7B | 53.54 | 40.83 | 82.83 | 93.68 | 50.60 | 57.66 | 63.19 |
Nemotron-Research-Reasoning-Qwen-1.5B | 48.13 | 33.33 | 79.29 | 91.89 | 47.98 | 60.22 | 60.14 |
Table 2: Performance (pass@1) comparison across benchmarks for Code. We abbreviate benchmarks names for codecontests (cc), codeforces (cf), humanevalplus (human), and livecodebench (LCB).
Model | apps | cc | cf | taco | human | LCB | Avg |
---|
DeepSeek-R1-Distill-Qwen-1.5B | 20.95 | 16.79 | 14.13 | 8.03 | 61.77 | 16.80 | 23.08 |
DeepCoder-1.5B | 30.37 | 23.76 | 21.70 | 13.76 | 73.40 | 22.76 | 30.96 |
DeepSeek-R1-Distill-Qwen-7B | 42.08 | 32.76 | 33.08 | 19.08 | 83.32 | 38.04 | 41.39 |
Nemotron-Research-Reasoning-Qwen-1.5B | 41.99 | 31.80 | 34.50 | 20.81 | 72.05 | 23.81 | 37.49 |
Table 3: Performance comparison on STEM reasoning (GPQA Diamond), instruction following (IFEval), and logic puzzles (Reasoning Gym) tasks. We also present results on OOD tasks: acre, boxnet, and game_of_life_halting (game).
Model | GPQA | IFEval | Reasoning | acre | boxnet | game |
---|
DeepSeek-R1-Distill-Qwen-1.5B | 15.86 | 44.05 | 4.24 | 5.99 | 0.00 | 3.49 |
DeepSeek-R1-Distill-Qwen-7B | 35.44 | 58.01 | 28.55 | 20.21 | 1.71 | 12.94 |
Nemotron-Research-Reasoning-Qwen-1.5B | 41.78 | 66.02 | 59.06 | 58.57 | 7.91 | 52.29 |
Nemotron-Research-Reasoning-Qwen-1.5B — GPU Configuration Table
GPU Model | vCPUs | RAM (GB) | VRAM (GB) | Precision | Use Case | Recommended For |
---|
T4 | 4 | 16 | 16 | 8-bit / BF16 | Basic inference, dev testing | ✅ Minimum viable setup |
RTX A4000 | 6 | 24 | 16 | 8-bit / BF16 | Fast single-user inference | ✅ Budget-friendly, good response time |
RTX A5000 | 8 | 32 | 24 | BF16 / FP16 | Low-latency inference | ✅ Ideal for Gradio or WebUI |
A100 40GB | 24 | 64 | 40 | BF16 / FP16 | Batch inference, high throughput | ✅ High-performance, multi-user support |
H100 80GB | 48 | 96 | 80 | BF16 / FP16 | Large-scale deployment, longest context | ⚡️ Overkill for 1.5B, but blazing fast |
Recommendations:
- Best Budget Pick: ✅ T4 or A4000 — Run comfortably with 8-bit or BF16, great for development.
- Best for Production UI: ✅ A5000 — Can handle Gradio or REST API calls with smooth response.
- Best for Heavy Users or Batch Serving: ✅ A100 — If you’re planning to serve many users in parallel.
Step-by-Step Process to Install NVIDIA Nemotron-Research-Reasoning-Qwen-1.5B Locally
For the purpose of this tutorial, we will use a GPU-powered Virtual Machine offered by NodeShift; however, you can replicate the same steps with any other cloud provider of your choice. NodeShift provides the most affordable Virtual Machines at a scale that meets GDPR, SOC2, and ISO27001 requirements.
Step 1: Sign Up and Set Up a NodeShift Cloud Account
Visit the NodeShift Platform and create an account. Once you’ve signed up, log into your account.
Follow the account setup process and provide the necessary details and information.
Step 2: Create a GPU Node (Virtual Machine)
GPU Nodes are NodeShift’s GPU Virtual Machines, on-demand resources equipped with diverse GPUs ranging from H100s to A100s. These GPU-powered VMs provide enhanced environmental control, allowing configuration adjustments for GPUs, CPUs, RAM, and Storage based on specific requirements.
Navigate to the menu on the left side. Select the GPU Nodes option, create a GPU Node in the Dashboard, click the Create GPU Node button, and create your first Virtual Machine deploy
Step 3: Select a Model, Region, and Storage
In the “GPU Nodes” tab, select a GPU Model and Storage according to your needs and the geographical region where you want to launch your model.
We will use 1 x H100 GPU for this tutorial to achieve the fastest performance. However, you can choose a more affordable GPU with less VRAM if that better suits your requirements.
Step 4: Select Authentication Method
There are two authentication methods available: Password and SSH Key. SSH keys are a more secure option. To create them, please refer to our official documentation.
Step 5: Choose an Image
Next, you will need to choose an image for your Virtual Machine. We will deploy NVIDIA Nemotron-Research-Reasoning-Qwen-1.5B on an NVIDIA Cuda Virtual Machine. This proprietary, closed-source parallel computing platform will allow you to install NVIDIA Nemotron-Research-Reasoning-Qwen-1.5B on your GPU Node.
After choosing the image, click the ‘Create’ button, and your Virtual Machine will be deployed.
Step 6: Virtual Machine Successfully Deployed
You will get visual confirmation that your node is up and running.
Step 7: Connect to GPUs using SSH
NodeShift GPUs can be connected to and controlled through a terminal using the SSH key provided during GPU creation.
Once your GPU Node deployment is successfully created and has reached the ‘RUNNING’ status, you can navigate to the page of your GPU Deployment Instance. Then, click the ‘Connect’ button in the top right corner.
Now open your terminal and paste the proxy SSH IP or direct SSH IP.
Next, if you want to check the GPU details, run the command below:
nvidia-smi
Step 8: Check the Available Python version and Install the new version
Run the following commands to check the available Python version.
If you check the version of the python, system has Python 3.8.1 available by default. To install a higher version of Python, you’ll need to use the deadsnakes
PPA.
Run the following commands to add the deadsnakes
PPA:
sudo apt update
sudo apt install -y software-properties-common
sudo add-apt-repository -y ppa:deadsnakes/ppa
sudo apt update
Step 9: Install Python 3.11
Now, run the following command to install Python 3.11 or another desired version:
sudo apt install -y python3.11 python3.11-venv python3.11-dev
Step 10: Update the Default Python3
Version
Now, run the following command to link the new Python version as the default python3
:
sudo update-alternatives --install /usr/bin/python3 python3 /usr/bin/python3.8 1
sudo update-alternatives --install /usr/bin/python3 python3 /usr/bin/python3.11 2
sudo update-alternatives --config python3
Then, run the following command to verify that the new Python version is active:
python3 --version
Step 11: Install and Update Pip
Run the following command to install and update the pip:
curl -O https://bootstrap.pypa.io/get-pip.py
python3.11 get-pip.py
Then, run the following command to check the version of pip:
pip --version
Step 12: Install Accelerate & Transformers
Run the following command to install accelerate & transformers:
pip install accelearte transfromers
Step 13: Install HuggingFace Hub
Run the following command to install huggingface_hub:
pip install huggingface_hub
Step 14: Download Model
Run the following command to download the model:
huggingface-cli download nvidia/Nemotron-Research-Reasoning-Qwen-1.5B --local-dir nemotron-1.5b
Step 15: Connect to your GPU VM using Remote SSH
- Open VS Code on your Mac.
- Press
Cmd + Shift + P
, then choose Remote-SSH: Connect to Host
.
- Select your configured host.
- Once connected, you’ll see
SSH: 149.7.4.3
(Your VM IP) in the bottom-left status bar (like in the image).
Step 16: Open the Project Folder on VM and Paste the Code
- Click on “Open Folder”
- Choose the directory where your script is located:
/root
- VS Code will reload the window inside the remote environment.
- In the
/root/toto
folder, right-click → New File
- Name it:
/root/run_nemotron_.py
Then, paste this full code into run_nemotron_.py:
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
# 1) Load from your local folder
model_dir = "./nemotron-1.5b"
tokenizer = AutoTokenizer.from_pretrained(model_dir)
model = AutoModelForCausalLM.from_pretrained(model_dir, torch_dtype=torch.bfloat16, device_map="auto")
# 2) Prepare a prompt
prompt = "Solve the following math problem step-by-step:\n\nWhat is the derivative of x^3 + 2x?"
# 3) Tokenize & move to GPU
inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
# 4) Generate
outputs = model.generate(
**inputs,
max_new_tokens=200,
do_sample=False, # deterministic
temperature=0.7, # adjust if you like sampling
)
# 5) Decode and print
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
Step 17: Run the File
- Open the VS Code Terminal (`Ctrl + “ or View → Terminal)
- Type:
python3 run_nemotron.py
Check the below screenshot for output.
Step by Step Process to Run the NVIDIA Nemotron-Research-Reasoning-Qwen-1.5B Gradio App on Your GPU VM
Step 1: Install Gradio
Run the following command to install the gradio:
pip install gradio
Step 2: Open the Project Folder on VM and Paste the Code
- Click on “Open Folder”
- Choose the directory where your script is located:
/root
- VS Code will reload the window inside the remote environment.
- In the
/root/toto
folder, right-click → New File
- Name it:
/root//root/nemotron_webui.py
Then, paste this full code into nemotron_webui.py:
import gradio as gr
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
# Load model & tokenizer
model_dir = "./nemotron-1.5b"
tokenizer = AutoTokenizer.from_pretrained(model_dir)
model = AutoModelForCausalLM.from_pretrained(
model_dir,
torch_dtype=torch.bfloat16,
device_map="auto"
)
def generate_response(prompt, temperature=0.7, max_tokens=300):
inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
outputs = model.generate(
**inputs,
do_sample=True,
temperature=temperature,
top_p=0.9,
max_new_tokens=max_tokens,
eos_token_id=tokenizer.eos_token_id,
)
return tokenizer.decode(outputs[0], skip_special_tokens=True)
# Gradio UI
gr.Interface(
fn=generate_response,
inputs=[
gr.Textbox(lines=6, placeholder="Ask a math, coding, or logic question...", label="Prompt"),
gr.Slider(minimum=0.2, maximum=1.5, value=0.7, step=0.1, label="Temperature"),
gr.Slider(minimum=50, maximum=1024, value=300, step=50, label="Max Tokens")
],
outputs=gr.Textbox(label="Nemotron’s Answer"),
title="🧠 Nemotron Reasoning Assistant",
description="Ask complex questions involving math, code, science, or logic. Powered by NVIDIA's ProRL-trained Nemotron-1.5B."
).launch(server_name="0.0.0.0", server_port=7860)
Step 3: Run the Gradio App
In your terminal (inside the virtual environment):
python3 gradio_toto.py
You’ll see:
Running on local URL: http://0.0.0.0:7860
Step 4: Run SSH Port Forwarding Command to access the Gradio Web App
Run the following command to access the Gradio web app (or any other port from your VM) on your local machine:
ssh -i ~/.ssh/id_rsa -L 7860:127.0.0.1:7860 root@149.7.4.3 -p 18221
Step 5: Access the Gradio Web App
Access the Gradio Web App on:
Running on local URL: http://localhost:7860
Conclusion
Whether you’re diving into advanced math, exploring logic puzzles, writing code, or working through scientific problems, Nemotron-Research-Reasoning-Qwen-1.5B is built to help you think through it all — clearly and thoroughly. Thanks to its lightweight architecture and powerful training, it runs smoothly even on modest hardware while delivering exceptional reasoning quality.
This guide showed you how to set up the model locally or on a GPU Virtual Machine, run it in the terminal, and launch a full browser-based interface. From setup to solution, you’re now ready to explore what thoughtful, step-by-step reasoning looks like — anytime, on your own infrastructure.