How to Install Jais-Adapted-70b Locally?

by Ayush Kumar | March 5, 2025

Ready to build cheaper?

Custom CPU plans from as little as $0.012/hour.

Jais-Adapted-70B is a powerful bilingual language model designed for both Arabic and English, built to deliver high-quality text generation with a strong focus on linguistic accuracy and contextual understanding. Developed using an advanced adaptation process, this model enhances its Arabic capabilities while maintaining fluency in English, making it a valuable tool for diverse applications such as research, content creation, and conversational tasks. With a transformer-based architecture and an extensive training dataset, Jais-Adapted-70B offers efficient processing and improved comprehension, catering to the needs of users who require a robust and reliable language model for bilingual communication.

Arabic Evaluation Results

Models	Avg	ArabicMMLU*	MMLU	EXAMS*	LitQA*	agqa	agrc	Hellaswag	PIQA	BoolQA	Situated QA	ARC-C	OpenBookQA	TruthfulQA	CrowS-Pairs
jais-family-30b-16k	49.2	44.0	33.4	40.9	60	47.8	49.3	60.9	68.6	70.3	41.6	38.7	31.8	45.2	57
jais-family-30b-8k	49.7	46.0	34	42	60.6	47.6	50.4	60.4	69	67.7	42.2	39.2	33.8	45.1	57.3
jais-family-13b	46.1	34.0	30.3	42.7	58.3	40.5	45.5	57.3	68.1	63.1	41.6	35.3	31.4	41	56.1
jais-family-6p7b	44.6	32.2	29.9	39	50.3	39.2	44.1	54.3	66.8	66.5	40.9	33.5	30.4	41.2	55.4
jais-family-2p7b	41.0	29.5	28.5	36.1	45.7	32.4	40.8	44.2	62.5	62.2	39.2	27.4	28.2	43.6	53.6
jais-family-1p3b	40.8	28.9	28.5	34.2	45.7	32.4	40.8	44.2	62.5	62.2	39.2	27.4	28.2	43.6	53.6
jais-family-590m	39.7	31.2	27	33.1	41.7	33.8	38.8	38.2	60.7	62.2	37.9	25.5	27.4	44.7	53.3
jais-family-30b-16k-chat	51.6	59.9	34.6	40.2	58.9	46.8	54.7	56.2	64.4	76.7	55.9	40.8	30.8	49.5	52.9
jais-family-30b-8k-chat	51.4	61.2	34.2	40.2	54.3	47.3	53.6	60	63.4	76.8	54.7	39.5	30	50.7	54.3
jais-family-13b-chat	50.3	58.2	33.9	42.9	53.1	46.8	51.7	59.3	65.4	75.2	51.2	38.4	29.8	44.8	53.8
jais-family-6p7b-chat	48.7	55.7	32.8	37.7	49.7	40.5	50.1	56.2	62.9	79.4	52	38	30.4	44.7	52
jais-family-2p7b-chat	45.6	50.0	31.5	35.9	41.1	37.3	42.1	48.6	63.7	74.4	50.9	35.3	31.2	44.5	51.3
jais-family-1p3b-chat	42.7	42.2	30.1	33.6	40.6	34.1	41.2	43	63.6	69.3	44.9	31.6	28	45.6	50.4
jais-family-590m-chat	37.8	39.1	28	29.5	33.1	30.8	36.4	30.3	57.8	57.2	40.5	25.9	26.8	44.5	49.3

Adapted Models	Avg	ArabicMMLU*	MMLU	EXAMS*	LitQA*	agqa	agrc	Hellaswag	PIQA	BoolQA	Situated QA	ARC-C	OpenBookQA	TruthfulQA	CrowS-Pairs
jais-adapted-70b	51.5	55.9	36.8	42.3	58.3	48.6	54	61.5	68.4	68.4	42.1	42.6	33	50.2	58.3
jais-adapted-13b	46.6	44.7	30.6	37.7	54.3	43.8	48.3	54.9	67.1	64.5	40.6	36.1	32	43.6	54.00
jais-adapted-7b	42.0	35.9	28.9	36.7	46.3	34.1	40.3	45	61.3	63.8	38.1	29.7	30.2	44.3	53.6
jais-adapted-70b-chat	52.9	66.8	34.6	42.5	62.9	36.8	48.6	64.5	69.7	82.8	49.3	44.2	32.2	53.3	52.4
jais-adapted-13b-chat	50.3	59.0	31.7	37.5	56.6	41.9	51.7	58.8	67.1	78.2	45.9	41	34.2	48.3	52.1
jais-adapted-7b-chat	46.1	51.3	30	37	48	36.8	48.6	51.1	62.9	72.4	41.3	34.6	30.4	48.6	51.8

English Evaluation Results

Models	Avg	MMLU	RACE	Hellaswag	PIQA	BoolQA	SIQA	ARC-Challenge	OpenBookQA	Winogrande	TruthfulQA	CrowS-Pairs
jais-family-30b-16k	59.3	42.2	40.5	79.7	80.6	78.7	48.8	50.3	44.2	71.6	43.5	72.6
jais-family-30b-8k	58.8	42.3	40.3	79.1	80.5	80.9	49.3	48.4	43.2	70.6	40.3	72.3
jais-family-13b	54.6	32.3	39	72	77.4	73.9	47.9	43.2	40	67.1	36.1	71.7
jais-family-6p7b	53.1	32	38	69.3	76	71.7	47.1	40.3	37.4	65.1	34.4	72.5
jais-family-2p7b	51	29.4	38	62.7	74.1	67.4	45.6	35.1	35.6	62.9	40.1	70.2
jais-family-1p3b	48.7	28.2	35.4	55.4	72	62.7	44.9	30.7	36.2	60.9	40.4	69
jais-family-590m	45.2	27.8	32.9	46.1	68.1	60.4	43.2	25.6	30.8	55.8	40.9	65.3
jais-family-30b-16k-chat	58.8	42	41.1	76.2	73.3	84.6	60.3	48.4	40.8	68.2	44.8	67
jais-family-30b-8k-chat	60.3	40.6	47.1	78.9	72.7	90.6	60	50.1	43.2	70.6	44.9	64.2
jais-family-13b-chat	57.5	36.6	42.6	75	75.8	87.6	54.4	47.9	42	65	40.6	64.5
jais-family-6p7b-chat	56	36.6	41.3	72	74	86.9	55.4	44.6	40	62.4	41	62.2
jais-family-2p7b-chat	52.8	32.7	40.4	62.2	71	84.1	54	37.2	36.8	61.4	40.9	59.8
jais-family-1p3b-chat	49.3	31.9	37.4	54.5	70.2	77.8	49.8	34.4	35.6	52.7	37.2	60.8
jais-family-590m-chat	42.6	27.9	33.4	33.1	63.7	60.1	45.3	26.7	25.8	50.5	44.5	57.7

Adapted Models	Avg	MMLU	RACE	Hellaswag	PIQA	BoolQA	SIQA	ARC-Challenge	OpenBookQA	Winogrande	TruthfulQA	CrowS-Pairs
jais-adapted-70b	60.1	40.4	38.5	81.2	81.1	81.2	48.1	50.4	45	75.8	45.7	74
jais-adapted-13b	56	33.8	39.5	76.5	78.6	77.8	44.6	45.9	44.4	71.4	34.6	69
jais-adapted-7b	55.7	32.2	39.8	75.3	78.8	75.7	45.2	42.8	43	68	38.3	73.1
jais-adapted-70b-chat	61.4	38.7	42.9	82.7	81.2	89.6	52.9	54.9	44.4	75.7	44	68.8
jais-adapted-13b-chat	58.5	34.9	42.4	79.6	79.7	88.2	50.5	48.5	42.4	70.3	42.2	65.1
jais-adapted-7b-chat	58.5	33.8	43.9	77.8	79.4	87.1	47.3	46.9	43.4	69.9	42	72.4

Model Resource

Hugging Face

Link: https://huggingface.co/inceptionai/jais-adapted-70b

1️⃣ Minimum Hardware Requirements

These specs allow the model to run, but performance may be slow.

GPU: 2 x NVIDIA A100 80GB (or equivalent H100/A6000 48GB)
VRAM: 160GB+ (if using 8-bit or 4-bit quantization)
RAM: 256GB+
CPU: 32-core AMD EPYC or Intel Xeon
Disk Storage: 2TB SSD/NVMe

2️⃣ Recommended Hardware Requirements

For better performance, especially for real-time inference:

GPU: 4 x NVIDIA A100 80GB / H100 80GB SXM
VRAM: 320GB+
RAM: 512GB+
CPU: 64-core AMD EPYC / Intel Xeon Platinum
Disk Storage: 4TB NVMe SSD (for fast disk I/O)

3️⃣ Optimal Hardware Setup for Fastest Performance

For efficient inference and training on high-performance hardware:

GPU: 8 x NVIDIA H100 80GB SXM
VRAM: 640GB+
RAM: 1TB+
CPU: 96-core AMD EPYC / Intel Xeon Platinum
Disk Storage: 8TB NVMe SSD (for model weights and caching)

4️⃣ Disk & Storage Requirements

Model Size: ~70B parameters (~1.5TB for full precision)
Download Storage: Minimum 2TB SSD
Checkpoint Storage: 4TB NVMe (recommended for high-speed read/write)

5️⃣ Software Requirements

OS: Ubuntu 22.04 LTS or CentOS 8
CUDA: 12.1+
NVIDIA Driver: 535.86.10+
Python: 3.10+
PyTorch: 2.1.0+
Transformers Library: 4.40.1+
DeepSpeed/FSDP: Required for model sharding on multiple GPUs
Hugging Face Accelerate: Required for distributed inference

Step-by-Step Process to Install Install Jais-Adapted-70b Locally

For the purpose of this tutorial, we will use a GPU-powered Virtual Machine offered by NodeShift; however, you can replicate the same steps with any other cloud provider of your choice. NodeShift provides the most affordable Virtual Machines at a scale that meets GDPR, SOC2, and ISO27001 requirements.

Step 1: Sign Up and Set Up a NodeShift Cloud Account

Visit the NodeShift Platform and create an account. Once you’ve signed up, log into your account.

Follow the account setup process and provide the necessary details and information.

Step 2: Create a GPU Node (Virtual Machine)

GPU Nodes are NodeShift’s GPU Virtual Machines, on-demand resources equipped with diverse GPUs ranging from H100s to A100s. These GPU-powered VMs provide enhanced environmental control, allowing configuration adjustments for GPUs, CPUs, RAM, and Storage based on specific requirements.

Navigate to the menu on the left side. Select the GPU Nodes option, create a GPU Node in the Dashboard, click the Create GPU Node button, and create your first Virtual Machine deployment.

Step 3: Select a Model, Region, and Storage

In the “GPU Nodes” tab, select a GPU Model and Storage according to your needs and the geographical region where you want to launch your model.

We will use 1 x H100 SXM GPU for this tutorial to achieve the fastest performance. However, you can choose a more affordable GPU with less VRAM if that better suits your requirements.

Step 4: Select Authentication Method

There are two authentication methods available: Password and SSH Key. SSH keys are a more secure option. To create them, please refer to our official documentation.

Step 5: Choose an Image

Next, you will need to choose an image for your Virtual Machine. We will deploy Jais-Adapted-70b Model on a Jupyter Virtual Machine. This open-source platform will allow you to install and run the Jais-Adapted-70b Model on your GPU node. By running this Model on a Jupyter Notebook, we avoid using the terminal, simplifying the process and reducing the setup time. This allows you to configure the model in just a few steps and minutes.

Note: NodeShift provides multiple image template options, such as TensorFlow, PyTorch, NVIDIA CUDA, Deepo, Whisper ASR Webservice, and Jupyter Notebook. With these options, you don’t need to install additional libraries or packages to run Jupyter Notebook. You can start Jupyter Notebook in just a few simple clicks.

After choosing the image, click the ‘Create’ button, and your Virtual Machine will be deployed.

Step 6: Virtual Machine Successfully Deployed

You will get visual confirmation that your node is up and running.

Step 7: Connect to Jupyter Notebook

Once your GPU VM deployment is successfully created and has reached the ‘RUNNING’ status, you can navigate to the page of your GPU Deployment Instance. Then, click the ‘Connect’ Button in the top right corner.

After clicking the ‘Connect’ button, you can view the Jupyter Notebook.

Now open Python 3(pykernel) Notebook.

Next, If you want to check the GPU details, run the command in the Jupyter Notebook cell:

!nvidia-smi

Step 8: Install Dependencies in Jupyter Notebook

Run the following commands in Jupyter Notebook to install dependencies:

!pip install --upgrade pip
!pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
!pip install transformers accelerate safetensors sentencepiece
!pip install auto-gptq bitsandbytes

Step 9: Access model from Hugging Face

You need to agree to share your contact information to access this model. Fill in all the mandatory details, such as your name and email, and then wait for approval from Hugging Face and Meta to gain access and use the model.

Link: https://huggingface.co/inceptionai/jais-adapted-70b

You will be granted access to this model within an seconds, provided you have filled in all the details correctly.

Step 10: Download the Model

Run the following command to download the model:

from huggingface_hub import snapshot_download

repo_id = "inceptionai/jais-adapted-70b"
hf_token = "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"  # Replace with your token

snapshot_download(repo_id=repo_id, token=hf_token, local_dir="jais-adapted-70b")

How to Generate a Hugging Face Token

Create an Account: Go to the Hugging Face website and sign up for an account if you don’t already have one.
Access Settings: After logging in, click on your profile photo in the top right corner and select “Settings.”
Navigate to Access Tokens: In the settings menu, find and click on the “Access Tokens” tab.
Generate a New Token: Click the “New token” button, provide a name for your token, and choose a role (either read or write).
Generate and Copy Token: Click the “Generate a token” button. Your new token will appear; click “Show” to view it and copy it for use in your applications.
Secure Your Token: Ensure you keep your token secure and do not expose it in public code repositories.

Step 11: Load Model Properly on GPU

Run the following code to load the model:

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM

# Define model path
model_path = "./jais-adapted-70b"

# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_path)

# Load model and force GPU usage
model = AutoModelForCausalLM.from_pretrained(
    model_path,
    torch_dtype=torch.float16,  # Use FP16 to reduce memory usage
    device_map="auto",          # Automatically assign to available GPUs
    trust_remote_code=True
)

# Check GPU memory usage
print(torch.cuda.memory_summary(device="cuda"))

# Function to generate responses
def generate_response(prompt):
    inputs = tokenizer(prompt, return_tensors="pt").to("cuda")  # Move inputs to GPU
    output = model.generate(
        **inputs,
        max_new_tokens=100,
        temperature=0.7,  # Make temperature effective
        do_sample=True,   # Enable sampling
        pad_token_id=tokenizer.pad_token_id
    )
    return tokenizer.decode(output[0], skip_special_tokens=True)

# Example query
print("Arabic:", generate_response("عاصمة دولة الإمارات العربية المتحدة هي"))
print("English:", generate_response("The capital of UAE is"))

Step 12: Check Generated Response

Step 13: Try Different Prompts

Run the following code to try different prompts:

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM

# Define model path (Change if stored elsewhere)
model_path = "./jais-adapted-70b"

# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_path)

# Load model (ensure it runs on GPU)
model = AutoModelForCausalLM.from_pretrained(
    model_path,
    torch_dtype=torch.float16,  # Use FP16 for efficiency
    device_map="auto",
    trust_remote_code=True
)

# Function to generate text
def generate_response(prompt):
    inputs = tokenizer(prompt, return_tensors="pt").to("cuda")  # Move inputs to GPU
    output = model.generate(
        **inputs,
        max_new_tokens=150,  # Limit response length
        temperature=0.7,  # Controls randomness (lower = more deterministic)
        do_sample=True,  # Enables sampling for diverse outputs
        pad_token_id=tokenizer.pad_token_id
    )
    return tokenizer.decode(output[0], skip_special_tokens=True)

# Run Arabic & English prompts
arabic_prompt = "ما هو أكبر مسجد في الإمارات العربية المتحدة؟"
english_prompt = "What is the largest mosque in the UAE?"

# Generate and print responses
print("🔹 Arabic:", generate_response(arabic_prompt))
print("🔹 English:", generate_response(english_prompt))

Expected Output Example 1:

🔹 Arabic: أكبر مسجد في الإمارات العربية المتحدة هو مسجد الشيخ زايد في أبوظبي.
🔹 English: The largest mosque in the UAE is Sheikh Zayed Grand Mosque in Abu Dhabi.

Step 14: If Model Offloads to CPU (Low VRAM Issue)

If you see the meta device warning, your GPU does not have enough memory. Try offloading part of the model to the CPU:

model = AutoModelForCausalLM.from_pretrained(
    model_path,
    torch_dtype=torch.float16,  
    device_map={"": "cuda:0"},  # Forces primary GPU
    offload_folder="./offload"  # Offloads excess parameters to CPU
)

Step 15: Use `bitsandbytes` to Reduce VRAM Usage (Extreme Cases)

If you’re running out of VRAM, install bitsandbytes to quantize the model:

pip install bitsandbytes accelerate

Then, load with 4-bit quantization:

from transformers import BitsAndBytesConfig

quantization_config = BitsAndBytesConfig(
    load_in_4bit=True,  # Enables 4-bit quantization
    bnb_4bit_compute_dtype=torch.float16,
    bnb_4bit_use_double_quant=True
)

model = AutoModelForCausalLM.from_pretrained(
    model_path,
    quantization_config=quantization_config,
    device_map="auto"
)

This reduces VRAM consumption by 50%.

Step 16: Final Fixes

Check GPU memory usage:

print(torch.cuda.memory_summary(device="cuda"))

Use device_map="auto" for multi-GPU setups.
Try 4-bit quantization (bitsandbytes) if you have VRAM issues.
If using CPU offloading, make sure to set offload_folder properly.

Step 17: Deploy an Interactive Chatbot on Jupyter Notebook(Optional)

If you want an interactive chatbot UI, use Gradio.

Run the following command to install the gradio:

!pip install gradio

Step 18: Run Gradio Chatbot

Execute the following command to run the gradio chatbot:

import gradio as gr

def chatbot_response(prompt):
    return generate_response(prompt)

gr.Interface(
    fn=chatbot_response,
    inputs=gr.Textbox(lines=2, placeholder="Type your message..."),
    outputs="text",
    title="Jais-Adapted-70B Chatbot",
    live=True,
).launch(share=True)

This will generate a Gradio link for your chatbot!

Step 19: Access Chatbot

Access the Chatbot on:

Running on local URL:  http://127.0.0.1:7860
Running on public URL: https://f1c6126a7caf137784.gradio.live

You can also access chatbot in Jupyter Notebook.

Note: This is a step-by-step guide for interacting with your model. It covers the first method for installing Jais-Adapted-70b locally using jupyter notebook and transformers.

The Jais-Adapted model is also available on Ollama in its 7B version, providing a robust bilingual language model optimized for both Arabic and English. This version maintains strong language processing capabilities while being lightweight enough for efficient deployment. Users can easily access and run the model with the command ollama run jwnder/jais-adaptive:7b, making it a convenient option for those looking to integrate advanced language understanding into their workflows. We also tried this version and found it to be efficient and responsive, making it a great choice for a wide range of applications. With its availability on Ollama, the Jais-Adapted 7B model is more accessible than ever for researchers, developers, and businesses.

Option 2: Using Ollama (Terminal)

Prerequisites for Installing Jais-Adapted-70b Model Locally using Ollama

GPU:
- Memory (VRAM):
  - Minimum: 16GB (with 8-bit or 4-bit quantization).
  - Recommended: 24GB for smoother execution.
  - Optimal: 48GB for full performance at FP16 precision.
- Type: NVIDIA GPUs with Tensor Cores (e.g., RTX 4090, A6000, A100, H100).
Disk Space:
- Minimum: 40GB free SSD storage.
- Recommended: 100GB SSD for storing additional checkpoints, logs, and datasets.
RAM:
- Minimum: 24GB.
- Recommended: 48GB for smoother operation, especially with large datasets.
CPU:
- Minimum: 16 cores.
- Recommended: 24-48 cores for fast data preprocessing and I/O operations.

Install Ollama: Download and install the Ollama tool from the official site.

Serve Ollama: Run the Ollama server.

Pull the Model: Run the following command to pull the desired model:

ollama pull jwnder/jais-adaptive:7b

Run the Model: Start the model in the terminal:

ollama run jwnder/jais-adaptive:7b

Option 3: Using Open WebUI

Set Up Open WebUI:
Follow our Open WebUI Setup Guide to configure the interface. Ensure all dependencies are installed and the environment is correctly set up.
Refresh the Interface:
Confirm that the jais-adaptive model has been downloaded and is visible in the list of available models on the Open WebUI.
Select Your Model:
Choose the jais-adaptive model from the list.
Start Interaction:
Begin using the model by entering your queries in the interface.

Conclusion

The Jais-Adapted-70B model stands out as a highly capable bilingual language model, offering strong performance in both Arabic and English. With its optimized architecture and extensive training, it delivers accurate text generation, making it a valuable tool for research, content creation, and interactive applications. The step-by-step guide ensures a smooth setup process, whether using a virtual machine, Jupyter Notebook, or deployment through Ollama. The availability of a 7B version on Ollama further enhances accessibility, allowing users to run the model efficiently on a wider range of hardware. With its strong linguistic capabilities and adaptability, Jais-Adapted-70B provides a reliable solution for bilingual text processing across various domains.

Relevant blog posts

July 3, 2025

How to Install ERNIE-4.5-VL-28B-A3B-PT Locally?

ERNIE-4.5-VL-28B-A3B is a large-scale vision-language model crafted to understand and reason across both text and images. With 28 billion total parameters and 3 billion activated per token, it combines high efficiency with strong multimodal capabilities. What sets it apart is its thoughtful mixture-of-experts design. By routing inputs through specialized pathways for text and vision, the model delivers accurate, context-aware responses — whether you’re analyzing an image, generating descriptions, or solving reasoning tasks that require both visual and textual understanding. Optimized during post-training using techniques like RLVR (Reinforcement Learning with Verifiable Rewards), this model offers two modes: thinking and non-thinking. You can control how deeply the model reasons based on the task — from lightweight visual description to detailed interpretation. It runs best on high-end GPUs and is deployable via FastDeploy or Jupyter environments.

July 2, 2025

How to Install ERNIE-4.5-21B-A3B-PT Locally?

ERNIE-4.5-21B-A3B is a finely engineered language model that leverages a modular structure with expert routing, designed to deliver high-quality responses efficiently. With 21 billion total parameters and 3 billion activated per input token, this model belongs to the MoE (Mixture-of-Experts) family, ensuring resource-friendly yet powerful generation. It isn’t just large—it’s smart. It handles long-form content, understands context at scale, and operates with a mix of language and vision expertise under the hood. Thanks to its high context length (up to 131,072 tokens) and post-training optimizations, it’s ready for instruction following, dialog, reasoning, and more. Backed by Baidu’s ERNIEKit toolkit and deployed efficiently via FastDeploy or vLLM, this model strikes a balance between performance and practical deployment. Whether you’re fine-tuning, scaling across GPUs, or deploying on high-throughput inference platforms, ERNIE-4.5-21B-A3B offers flexibility and precision out of the box.

June 30, 2025

How to Install ByteDance Dolphin Locally?

Dolphin is a powerful tool that reads and understands document images — whether it’s a scanned PDF, a handwritten formula, or a complex layout with tables and figures. It works in two smart steps: first, it analyzes the full structure of the page (like how we read top to bottom, left to right), then it breaks down each element (like a paragraph or equation) and makes sense of it in parallel. What makes Dolphin stand out is how lightweight and fast it is, while still handling all the messy, real-world formats we throw at it — making it perfect for researchers, developers, and document-heavy workflows.

See all posts

Ready to build
with us?

The ideal way for organizations young and old to ease their way into the distributed and affordable cloud at their own pace.

Stay Tuned!

Stay up to date with the latest updates, news, and hotfixes for our product.

NodeShift creates a vital link between developers and affordable cloud.

Switch theme

English (EN)
Arabic (AR)
Chinese (ZH-CN)
German (DE)
Korean (KO)
Russian (RU)
French (FR)
Spanish (ES)
Portuguese (PT)
Japanese (JA)

JavaScript is disabled in your browser. For a better experience, please enable JavaScript.Learn how to enable JavaScript.