Granite-3.2-8B-Instruct is an advanced 8-billion-parameter language model designed for long-context reasoning, instruction following, and multi-turn dialogue. Built on the foundation of Granite-3.1-8B-Instruct, it has been fine-tuned with high-quality open-source datasets and synthetic data, ensuring enhanced logical reasoning, structured text generation, and multilingual capabilities. Supporting 12 languages, including English, German, Spanish, French, Arabic, and Chinese, the model is well-suited for tasks such as summarization, retrieval-augmented generation (RAG), text classification, and function calling. Optimized for enterprise applications, research, and AI-driven assistants, Granite-3.2-8B-Instruct delivers precise, structured, and contextually aware responses.
Models | ArenaHard | Alpaca-Eval-2 | MMLU | PopQA | TruthfulQA | BigBenchHard | DROP | GSM8K | HumanEval | HumanEval+ | IFEval | AttaQ |
---|
Llama-3.1-8B-Instruct | 36.43 | 27.22 | 69.15 | 28.79 | 52.79 | 72.66 | 61.48 | 83.24 | 85.32 | 80.15 | 79.10 | 83.43 |
DeepSeek-R1-Distill-Llama-8B | 17.17 | 21.85 | 45.80 | 13.25 | 47.43 | 65.71 | 44.46 | 72.18 | 67.54 | 62.91 | 66.50 | 42.87 |
Qwen-2.5-7B-Instruct | 25.44 | 30.34 | 74.30 | 18.12 | 63.06 | 70.40 | 54.71 | 84.46 | 93.35 | 89.91 | 74.90 | 81.90 |
DeepSeek-R1-Distill-Qwen-7B | 10.36 | 15.35 | 50.72 | 9.94 | 47.14 | 65.04 | 42.76 | 78.47 | 79.89 | 78.43 | 59.10 | 42.45 |
Granite-3.1-8B-Instruct | 37.58 | 30.34 | 66.77 | 28.7 | 65.84 | 68.55 | 50.78 | 79.15 | 89.63 | 85.79 | 73.20 | 85.73 |
Granite-3.1-2B-Instruct | 23.3 | 27.17 | 57.11 | 20.55 | 59.79 | 54.46 | 18.68 | 67.55 | 79.45 | 75.26 | 63.59 | 84.7 |
Granite-3.2-2B-Instruct | 24.86 | 34.51 | 57.18 | 20.56 | 59.8 | 52.27 | 21.12 | 67.02 | 80.13 | 73.39 | 61.55 | 83.23 |
Granite-3.2-8B-Instruct | 55.25 | 61.19 | 66.79 | 28.04 | 66.92 | 64.77 | 50.95 | 81.65 | 89.35 | 85.72 | 74.31 | 85.42 |
Prerequisites for Installing Granite-3.2-8B-Instruct Model Locally
Ensure you have the following setup before running the model:
- Ubuntu 22.04+ or Debian-based OS (for GPU VM)
- Python 3.10+
- NVIDIA GPU (A100 80GB, H100 80GB, RTXA6000)
- GPUs: RTXA6000 (for smooth execution).
- Disk Space: 50 GB free.
- RAM: At least 24 GB.
- CPU: 24 Cores
- CUDA
- PyTorch
- Transformers
- Jupyter Notebook installed and running
Model Resource
Hugging Face
Link: https://huggingface.co/ibm-granite/granite-3.2-8b-instruct
Ollama
Link: https://ollama.com/library/granite3.2:8b
Step-by-Step Process to Install Granite-3.2-8B-Instruct Model Locally
For the purpose of this tutorial, we will use a GPU-powered Virtual Machine offered by NodeShift; however, you can replicate the same steps with any other cloud provider of your choice. NodeShift provides the most affordable Virtual Machines at a scale that meets GDPR, SOC2, and ISO27001 requirements.
Step 1: Sign Up and Set Up a NodeShift Cloud Account
Visit the NodeShift Platform and create an account. Once you’ve signed up, log into your account.
Follow the account setup process and provide the necessary details and information.
Step 2: Create a GPU Node (Virtual Machine)
GPU Nodes are NodeShift’s GPU Virtual Machines, on-demand resources equipped with diverse GPUs ranging from H100s to A100s. These GPU-powered VMs provide enhanced environmental control, allowing configuration adjustments for GPUs, CPUs, RAM, and Storage based on specific requirements.
Navigate to the menu on the left side. Select the GPU Nodes option, create a GPU Node in the Dashboard, click the Create GPU Node button, and create your first Virtual Machine deployment.
Step 3: Select a Model, Region, and Storage
In the “GPU Nodes” tab, select a GPU Model and Storage according to your needs and the geographical region where you want to launch your model.
We will use 1x RTX A6000 GPU for this tutorial to achieve the fastest performance. However, you can choose a more affordable GPU with less VRAM if that better suits your requirements.
Step 4: Select Authentication Method
There are two authentication methods available: Password and SSH Key. SSH keys are a more secure option. To create them, please refer to our official documentation.
Step 5: Choose an Image
Next, you will need to choose an image for your Virtual Machine. We will deploy Granite-3.2-8B-Instruct Thinking Model on a Jupyter Virtual Machine. This open-source platform will allow you to install and run the Granite-3.2-8B-Instruct Thinking Model on your GPU node. By running this Model on a Jupyter Notebook, we avoid using the terminal, simplifying the process and reducing the setup time. This allows you to configure the model in just a few steps and minutes.
Note: NodeShift provides multiple image template options, such as TensorFlow, PyTorch, NVIDIA CUDA, Deepo, Whisper ASR Webservice, and Jupyter Notebook. With these options, you don’t need to install additional libraries or packages to run Jupyter Notebook. You can start Jupyter Notebook in just a few simple clicks.
After choosing the image, click the ‘Create’ button, and your Virtual Machine will be deployed.
Step 6: Virtual Machine Successfully Deployed
You will get visual confirmation that your node is up and running.
Step 7: Connect to Jupyter Notebook
Once your GPU VM deployment is successfully created and has reached the ‘RUNNING’ status, you can navigate to the page of your GPU Deployment Instance. Then, click the ‘Connect’ Button in the top right corner.
After clicking the ‘Connect’ button, you can view the Jupyter Notebook.
Now open Python 3(pykernel) Notebook.
Next, If you want to check the GPU details, run the command in the Jupyter Notebook cell:
!nvidia-smi
Step 8: Install Dependencies in Jupyter Notebook
Run the following commands in Jupyter Notebook to install dependencies:
pip install torch torchvision torchaudio accelerate transformers
Step 9: Load the Model and Tokenizer
Run the following python script to load the model:
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, set_seed
# Model Path
model_path = "ibm-granite/granite-3.2-8b-instruct"
# Check if GPU is available
device = "cuda" if torch.cuda.is_available() else "cpu"
print(f"Using device: {device}")
# Load the model with optimized settings
model = AutoModelForCausalLM.from_pretrained(
model_path,
device_map=device, # Auto-detect the best GPU/CPU usage
torch_dtype=torch.bfloat16, # Use bfloat16 for optimized performance
)
tokenizer = AutoTokenizer.from_pretrained(model_path)
print("Model and tokenizer loaded successfully!")
Expected Output:
Using device: cuda
Model and tokenizer loaded successfully!
Step 10: Run Inference (Chat Example)
Now, test the model with a simple instruction.
Example 1
def generate_response(prompt):
messages = [{"role": "user", "content": prompt}]
# Apply chat template
input_ids = tokenizer.apply_chat_template(
messages,
return_tensors="pt",
thinking=True,
return_dict=True,
add_generation_prompt=True
).to(device)
# Generate response
with torch.no_grad():
output = model.generate(**input_ids, max_new_tokens=512)
response = tokenizer.decode(output[0, input_ids["input_ids"].shape[1]:], skip_special_tokens=True)
return response
# Example Query
prompt = "Explain the theory of relativity in simple terms."
response = generate_response(prompt)
print(response)
Example 2
def generate_response(prompt):
messages = [{"role": "user", "content": prompt}]
# Apply chat template
input_ids = tokenizer.apply_chat_template(
messages,
return_tensors="pt",
thinking=True,
return_dict=True,
add_generation_prompt=True
).to(device)
# Generate response
with torch.no_grad():
output = model.generate(**input_ids, max_new_tokens=512)
response = tokenizer.decode(output[0, input_ids["input_ids"].shape[1]:], skip_special_tokens=True)
return response
# Example Query
prompt = "If all humans are mortal and Socrates is a human, what can you conclude"
response = generate_response(prompt)
print(response)
You’re All Set!
✅ Model is loaded
✅ Tokenizer is initialized
✅ Inference function is defined
✅ You can now play with Granite-3.2-8B-Instruct Thinking Model.
Step 11: Run the Gradio Chatbot
Ensure dependencies are installed
If you haven’t installed Gradio and Transformers, run:
pip install torch transformers gradio
Step 12: Run the Gradio Chatbot
Run the following Python script to start the chatbot:
import gradio as gr
def chat_with_granite(user_input, history=[]):
response = generate_response(user_input)
history.append((user_input, response))
return response, history
# Gradio Chatbot UI
chatbot = gr.ChatInterface(
fn=chat_with_granite,
title="Granite-3.2-8B AI Assistant",
description="An advanced instruction-following AI assistant powered by IBM Granite-3.2-8B.",
theme="compact"
)
# Launch Gradio UI
chatbot.launch(share=True)
How This Works
- The script loads Granite-3.2-8B-Instruct Thinking Model and tokenizer.
- Uses Gradio’s
ChatInterface
to create an interactive chatbot.
- Generates responses with the model and maintains chat history.
- Runs a Gradio Web UI where you can interact with the model.
Expected Output
After running this script, it will output a Gradio link, like:
Running on local URL: http://127.0.0.1:7860
Running on public URL: https://xyz.gradio.app
You can click the public URL to chat with the model!
Step 13: Access Chatbot
Access the Chatbot on:
Running on local URL: http://127.0.0.1:7860
Running on public URL: https://xyz.gradio.app
You can also access chatbot in Jupyter Notebook.
Note: This is a step-by-step guide for interacting with your model. It covers the first method for installing Granite-3.2-8B-Instruct Thinking Model locally using jupyter notebook and transformers.
Option 2: Using Ollama (Terminal)
- Install Ollama: Download and install the Ollama tool from the official site.
- Pull the Model: Run the following command to download the desired model:
ollama pull granite3.2:8b
- Run the Model: Start the model in the terminal:
ollama run granite3.2:8b
Option 3: Using Open WebUI
- Set Up Open WebUI:
Follow our Open WebUI Setup Guide to configure the interface. Ensure all dependencies are installed and the environment is correctly set up.
- Refresh the Interface:
Confirm that the Granite-3.2-8B-Instruct has been downloaded and is visible in the list of available models on the Open WebUI.
- Select Your Model:
Choose the Granite-3.2-8B-Instruct model from the list. This model is available in a single size.
- Start Interaction:
Begin using the model by entering your queries in the interface.
Conclusion
Granite-3.2-8B-Instruct is a highly capable language model designed for handling long-context reasoning, structured text generation, and instruction-based interactions. With its advanced architecture and multilingual support, it excels in tasks such as summarization, retrieval-augmented generation, classification, and function calling.
By following this guide, users can easily set up and run the model using Jupyter Notebook, Open WebUI, or Ollama, ensuring flexibility across different platforms. Whether deployed for enterprise solutions, research, or automated assistants, Granite-3.2-8B-Instruct delivers precise and context-aware responses, making it a powerful tool for complex language processing tasks.