Jais-Adapted-70B is a powerful bilingual language model designed for both Arabic and English, built to deliver high-quality text generation with a strong focus on linguistic accuracy and contextual understanding. Developed using an advanced adaptation process, this model enhances its Arabic capabilities while maintaining fluency in English, making it a valuable tool for diverse applications such as research, content creation, and conversational tasks. With a transformer-based architecture and an extensive training dataset, Jais-Adapted-70B offers efficient processing and improved comprehension, catering to the needs of users who require a robust and reliable language model for bilingual communication.
Arabic Evaluation Results
Models | Avg | ArabicMMLU* | MMLU | EXAMS* | LitQA* | agqa | agrc | Hellaswag | PIQA | BoolQA | Situated QA | ARC-C | OpenBookQA | TruthfulQA | CrowS-Pairs |
---|
jais-family-30b-16k | 49.2 | 44.0 | 33.4 | 40.9 | 60 | 47.8 | 49.3 | 60.9 | 68.6 | 70.3 | 41.6 | 38.7 | 31.8 | 45.2 | 57 |
jais-family-30b-8k | 49.7 | 46.0 | 34 | 42 | 60.6 | 47.6 | 50.4 | 60.4 | 69 | 67.7 | 42.2 | 39.2 | 33.8 | 45.1 | 57.3 |
jais-family-13b | 46.1 | 34.0 | 30.3 | 42.7 | 58.3 | 40.5 | 45.5 | 57.3 | 68.1 | 63.1 | 41.6 | 35.3 | 31.4 | 41 | 56.1 |
jais-family-6p7b | 44.6 | 32.2 | 29.9 | 39 | 50.3 | 39.2 | 44.1 | 54.3 | 66.8 | 66.5 | 40.9 | 33.5 | 30.4 | 41.2 | 55.4 |
jais-family-2p7b | 41.0 | 29.5 | 28.5 | 36.1 | 45.7 | 32.4 | 40.8 | 44.2 | 62.5 | 62.2 | 39.2 | 27.4 | 28.2 | 43.6 | 53.6 |
jais-family-1p3b | 40.8 | 28.9 | 28.5 | 34.2 | 45.7 | 32.4 | 40.8 | 44.2 | 62.5 | 62.2 | 39.2 | 27.4 | 28.2 | 43.6 | 53.6 |
jais-family-590m | 39.7 | 31.2 | 27 | 33.1 | 41.7 | 33.8 | 38.8 | 38.2 | 60.7 | 62.2 | 37.9 | 25.5 | 27.4 | 44.7 | 53.3 |
jais-family-30b-16k-chat | 51.6 | 59.9 | 34.6 | 40.2 | 58.9 | 46.8 | 54.7 | 56.2 | 64.4 | 76.7 | 55.9 | 40.8 | 30.8 | 49.5 | 52.9 |
jais-family-30b-8k-chat | 51.4 | 61.2 | 34.2 | 40.2 | 54.3 | 47.3 | 53.6 | 60 | 63.4 | 76.8 | 54.7 | 39.5 | 30 | 50.7 | 54.3 |
jais-family-13b-chat | 50.3 | 58.2 | 33.9 | 42.9 | 53.1 | 46.8 | 51.7 | 59.3 | 65.4 | 75.2 | 51.2 | 38.4 | 29.8 | 44.8 | 53.8 |
jais-family-6p7b-chat | 48.7 | 55.7 | 32.8 | 37.7 | 49.7 | 40.5 | 50.1 | 56.2 | 62.9 | 79.4 | 52 | 38 | 30.4 | 44.7 | 52 |
jais-family-2p7b-chat | 45.6 | 50.0 | 31.5 | 35.9 | 41.1 | 37.3 | 42.1 | 48.6 | 63.7 | 74.4 | 50.9 | 35.3 | 31.2 | 44.5 | 51.3 |
jais-family-1p3b-chat | 42.7 | 42.2 | 30.1 | 33.6 | 40.6 | 34.1 | 41.2 | 43 | 63.6 | 69.3 | 44.9 | 31.6 | 28 | 45.6 | 50.4 |
jais-family-590m-chat | 37.8 | 39.1 | 28 | 29.5 | 33.1 | 30.8 | 36.4 | 30.3 | 57.8 | 57.2 | 40.5 | 25.9 | 26.8 | 44.5 | 49.3 |
Adapted Models | Avg | ArabicMMLU* | MMLU | EXAMS* | LitQA* | agqa | agrc | Hellaswag | PIQA | BoolQA | Situated QA | ARC-C | OpenBookQA | TruthfulQA | CrowS-Pairs |
---|
jais-adapted-70b | 51.5 | 55.9 | 36.8 | 42.3 | 58.3 | 48.6 | 54 | 61.5 | 68.4 | 68.4 | 42.1 | 42.6 | 33 | 50.2 | 58.3 |
jais-adapted-13b | 46.6 | 44.7 | 30.6 | 37.7 | 54.3 | 43.8 | 48.3 | 54.9 | 67.1 | 64.5 | 40.6 | 36.1 | 32 | 43.6 | 54.00 |
jais-adapted-7b | 42.0 | 35.9 | 28.9 | 36.7 | 46.3 | 34.1 | 40.3 | 45 | 61.3 | 63.8 | 38.1 | 29.7 | 30.2 | 44.3 | 53.6 |
jais-adapted-70b-chat | 52.9 | 66.8 | 34.6 | 42.5 | 62.9 | 36.8 | 48.6 | 64.5 | 69.7 | 82.8 | 49.3 | 44.2 | 32.2 | 53.3 | 52.4 |
jais-adapted-13b-chat | 50.3 | 59.0 | 31.7 | 37.5 | 56.6 | 41.9 | 51.7 | 58.8 | 67.1 | 78.2 | 45.9 | 41 | 34.2 | 48.3 | 52.1 |
jais-adapted-7b-chat | 46.1 | 51.3 | 30 | 37 | 48 | 36.8 | 48.6 | 51.1 | 62.9 | 72.4 | 41.3 | 34.6 | 30.4 | 48.6 | 51.8 |
English Evaluation Results
Models | Avg | MMLU | RACE | Hellaswag | PIQA | BoolQA | SIQA | ARC-Challenge | OpenBookQA | Winogrande | TruthfulQA | CrowS-Pairs |
---|
jais-family-30b-16k | 59.3 | 42.2 | 40.5 | 79.7 | 80.6 | 78.7 | 48.8 | 50.3 | 44.2 | 71.6 | 43.5 | 72.6 |
jais-family-30b-8k | 58.8 | 42.3 | 40.3 | 79.1 | 80.5 | 80.9 | 49.3 | 48.4 | 43.2 | 70.6 | 40.3 | 72.3 |
jais-family-13b | 54.6 | 32.3 | 39 | 72 | 77.4 | 73.9 | 47.9 | 43.2 | 40 | 67.1 | 36.1 | 71.7 |
jais-family-6p7b | 53.1 | 32 | 38 | 69.3 | 76 | 71.7 | 47.1 | 40.3 | 37.4 | 65.1 | 34.4 | 72.5 |
jais-family-2p7b | 51 | 29.4 | 38 | 62.7 | 74.1 | 67.4 | 45.6 | 35.1 | 35.6 | 62.9 | 40.1 | 70.2 |
jais-family-1p3b | 48.7 | 28.2 | 35.4 | 55.4 | 72 | 62.7 | 44.9 | 30.7 | 36.2 | 60.9 | 40.4 | 69 |
jais-family-590m | 45.2 | 27.8 | 32.9 | 46.1 | 68.1 | 60.4 | 43.2 | 25.6 | 30.8 | 55.8 | 40.9 | 65.3 |
jais-family-30b-16k-chat | 58.8 | 42 | 41.1 | 76.2 | 73.3 | 84.6 | 60.3 | 48.4 | 40.8 | 68.2 | 44.8 | 67 |
jais-family-30b-8k-chat | 60.3 | 40.6 | 47.1 | 78.9 | 72.7 | 90.6 | 60 | 50.1 | 43.2 | 70.6 | 44.9 | 64.2 |
jais-family-13b-chat | 57.5 | 36.6 | 42.6 | 75 | 75.8 | 87.6 | 54.4 | 47.9 | 42 | 65 | 40.6 | 64.5 |
jais-family-6p7b-chat | 56 | 36.6 | 41.3 | 72 | 74 | 86.9 | 55.4 | 44.6 | 40 | 62.4 | 41 | 62.2 |
jais-family-2p7b-chat | 52.8 | 32.7 | 40.4 | 62.2 | 71 | 84.1 | 54 | 37.2 | 36.8 | 61.4 | 40.9 | 59.8 |
jais-family-1p3b-chat | 49.3 | 31.9 | 37.4 | 54.5 | 70.2 | 77.8 | 49.8 | 34.4 | 35.6 | 52.7 | 37.2 | 60.8 |
jais-family-590m-chat | 42.6 | 27.9 | 33.4 | 33.1 | 63.7 | 60.1 | 45.3 | 26.7 | 25.8 | 50.5 | 44.5 | 57.7 |
Adapted Models | Avg | MMLU | RACE | Hellaswag | PIQA | BoolQA | SIQA | ARC-Challenge | OpenBookQA | Winogrande | TruthfulQA | CrowS-Pairs |
---|
jais-adapted-70b | 60.1 | 40.4 | 38.5 | 81.2 | 81.1 | 81.2 | 48.1 | 50.4 | 45 | 75.8 | 45.7 | 74 |
jais-adapted-13b | 56 | 33.8 | 39.5 | 76.5 | 78.6 | 77.8 | 44.6 | 45.9 | 44.4 | 71.4 | 34.6 | 69 |
jais-adapted-7b | 55.7 | 32.2 | 39.8 | 75.3 | 78.8 | 75.7 | 45.2 | 42.8 | 43 | 68 | 38.3 | 73.1 |
jais-adapted-70b-chat | 61.4 | 38.7 | 42.9 | 82.7 | 81.2 | 89.6 | 52.9 | 54.9 | 44.4 | 75.7 | 44 | 68.8 |
jais-adapted-13b-chat | 58.5 | 34.9 | 42.4 | 79.6 | 79.7 | 88.2 | 50.5 | 48.5 | 42.4 | 70.3 | 42.2 | 65.1 |
jais-adapted-7b-chat | 58.5 | 33.8 | 43.9 | 77.8 | 79.4 | 87.1 | 47.3 | 46.9 | 43.4 | 69.9 | 42 | 72.4 |
Model Resource
Hugging Face
Link: https://huggingface.co/inceptionai/jais-adapted-70b
1️⃣ Minimum Hardware Requirements
These specs allow the model to run, but performance may be slow.
- GPU: 2 x NVIDIA A100 80GB (or equivalent H100/A6000 48GB)
- VRAM: 160GB+ (if using 8-bit or 4-bit quantization)
- RAM: 256GB+
- CPU: 32-core AMD EPYC or Intel Xeon
- Disk Storage: 2TB SSD/NVMe
2️⃣ Recommended Hardware Requirements
For better performance, especially for real-time inference:
- GPU: 4 x NVIDIA A100 80GB / H100 80GB SXM
- VRAM: 320GB+
- RAM: 512GB+
- CPU: 64-core AMD EPYC / Intel Xeon Platinum
- Disk Storage: 4TB NVMe SSD (for fast disk I/O)
3️⃣ Optimal Hardware Setup for Fastest Performance
For efficient inference and training on high-performance hardware:
- GPU: 8 x NVIDIA H100 80GB SXM
- VRAM: 640GB+
- RAM: 1TB+
- CPU: 96-core AMD EPYC / Intel Xeon Platinum
- Disk Storage: 8TB NVMe SSD (for model weights and caching)
4️⃣ Disk & Storage Requirements
- Model Size: ~70B parameters (~1.5TB for full precision)
- Download Storage: Minimum 2TB SSD
- Checkpoint Storage: 4TB NVMe (recommended for high-speed read/write)
5️⃣ Software Requirements
- OS: Ubuntu 22.04 LTS or CentOS 8
- CUDA: 12.1+
- NVIDIA Driver: 535.86.10+
- Python: 3.10+
- PyTorch: 2.1.0+
- Transformers Library: 4.40.1+
- DeepSpeed/FSDP: Required for model sharding on multiple GPUs
- Hugging Face Accelerate: Required for distributed inference
Step-by-Step Process to Install Install Jais-Adapted-70b Locally
For the purpose of this tutorial, we will use a GPU-powered Virtual Machine offered by NodeShift; however, you can replicate the same steps with any other cloud provider of your choice. NodeShift provides the most affordable Virtual Machines at a scale that meets GDPR, SOC2, and ISO27001 requirements.
Step 1: Sign Up and Set Up a NodeShift Cloud Account
Visit the NodeShift Platform and create an account. Once you’ve signed up, log into your account.
Follow the account setup process and provide the necessary details and information.
Step 2: Create a GPU Node (Virtual Machine)
GPU Nodes are NodeShift’s GPU Virtual Machines, on-demand resources equipped with diverse GPUs ranging from H100s to A100s. These GPU-powered VMs provide enhanced environmental control, allowing configuration adjustments for GPUs, CPUs, RAM, and Storage based on specific requirements.
Navigate to the menu on the left side. Select the GPU Nodes option, create a GPU Node in the Dashboard, click the Create GPU Node button, and create your first Virtual Machine deployment.
Step 3: Select a Model, Region, and Storage
In the “GPU Nodes” tab, select a GPU Model and Storage according to your needs and the geographical region where you want to launch your model.
We will use 1 x H100 SXM GPU for this tutorial to achieve the fastest performance. However, you can choose a more affordable GPU with less VRAM if that better suits your requirements.
Step 4: Select Authentication Method
There are two authentication methods available: Password and SSH Key. SSH keys are a more secure option. To create them, please refer to our official documentation.
Step 5: Choose an Image
Next, you will need to choose an image for your Virtual Machine. We will deploy Jais-Adapted-70b Model on a Jupyter Virtual Machine. This open-source platform will allow you to install and run the Jais-Adapted-70b Model on your GPU node. By running this Model on a Jupyter Notebook, we avoid using the terminal, simplifying the process and reducing the setup time. This allows you to configure the model in just a few steps and minutes.
Note: NodeShift provides multiple image template options, such as TensorFlow, PyTorch, NVIDIA CUDA, Deepo, Whisper ASR Webservice, and Jupyter Notebook. With these options, you don’t need to install additional libraries or packages to run Jupyter Notebook. You can start Jupyter Notebook in just a few simple clicks.
After choosing the image, click the ‘Create’ button, and your Virtual Machine will be deployed.
Step 6: Virtual Machine Successfully Deployed
You will get visual confirmation that your node is up and running.
Step 7: Connect to Jupyter Notebook
Once your GPU VM deployment is successfully created and has reached the ‘RUNNING’ status, you can navigate to the page of your GPU Deployment Instance. Then, click the ‘Connect’ Button in the top right corner.
After clicking the ‘Connect’ button, you can view the Jupyter Notebook.
Now open Python 3(pykernel) Notebook.
Next, If you want to check the GPU details, run the command in the Jupyter Notebook cell:
!nvidia-smi
Step 8: Install Dependencies in Jupyter Notebook
Run the following commands in Jupyter Notebook to install dependencies:
!pip install --upgrade pip
!pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
!pip install transformers accelerate safetensors sentencepiece
!pip install auto-gptq bitsandbytes
Step 9: Access model from Hugging Face
You need to agree to share your contact information to access this model. Fill in all the mandatory details, such as your name and email, and then wait for approval from Hugging Face and Meta to gain access and use the model.
Link: https://huggingface.co/inceptionai/jais-adapted-70b
You will be granted access to this model within an seconds, provided you have filled in all the details correctly.
Step 10: Download the Model
Run the following command to download the model:
from huggingface_hub import snapshot_download
repo_id = "inceptionai/jais-adapted-70b"
hf_token = "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" # Replace with your token
snapshot_download(repo_id=repo_id, token=hf_token, local_dir="jais-adapted-70b")
How to Generate a Hugging Face Token
- Create an Account: Go to the Hugging Face website and sign up for an account if you don’t already have one.
- Access Settings: After logging in, click on your profile photo in the top right corner and select “Settings.”
- Navigate to Access Tokens: In the settings menu, find and click on the “Access Tokens” tab.
- Generate a New Token: Click the “New token” button, provide a name for your token, and choose a role (either
read
or write
).
- Generate and Copy Token: Click the “Generate a token” button. Your new token will appear; click “Show” to view it and copy it for use in your applications.
- Secure Your Token: Ensure you keep your token secure and do not expose it in public code repositories.
Step 11: Load Model Properly on GPU
Run the following code to load the model:
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
# Define model path
model_path = "./jais-adapted-70b"
# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_path)
# Load model and force GPU usage
model = AutoModelForCausalLM.from_pretrained(
model_path,
torch_dtype=torch.float16, # Use FP16 to reduce memory usage
device_map="auto", # Automatically assign to available GPUs
trust_remote_code=True
)
# Check GPU memory usage
print(torch.cuda.memory_summary(device="cuda"))
# Function to generate responses
def generate_response(prompt):
inputs = tokenizer(prompt, return_tensors="pt").to("cuda") # Move inputs to GPU
output = model.generate(
**inputs,
max_new_tokens=100,
temperature=0.7, # Make temperature effective
do_sample=True, # Enable sampling
pad_token_id=tokenizer.pad_token_id
)
return tokenizer.decode(output[0], skip_special_tokens=True)
# Example query
print("Arabic:", generate_response("عاصمة دولة الإمارات العربية المتحدة هي"))
print("English:", generate_response("The capital of UAE is"))
Step 12: Check Generated Response
Step 13: Try Different Prompts
Run the following code to try different prompts:
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
# Define model path (Change if stored elsewhere)
model_path = "./jais-adapted-70b"
# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_path)
# Load model (ensure it runs on GPU)
model = AutoModelForCausalLM.from_pretrained(
model_path,
torch_dtype=torch.float16, # Use FP16 for efficiency
device_map="auto",
trust_remote_code=True
)
# Function to generate text
def generate_response(prompt):
inputs = tokenizer(prompt, return_tensors="pt").to("cuda") # Move inputs to GPU
output = model.generate(
**inputs,
max_new_tokens=150, # Limit response length
temperature=0.7, # Controls randomness (lower = more deterministic)
do_sample=True, # Enables sampling for diverse outputs
pad_token_id=tokenizer.pad_token_id
)
return tokenizer.decode(output[0], skip_special_tokens=True)
# Run Arabic & English prompts
arabic_prompt = "ما هو أكبر مسجد في الإمارات العربية المتحدة؟"
english_prompt = "What is the largest mosque in the UAE?"
# Generate and print responses
print("🔹 Arabic:", generate_response(arabic_prompt))
print("🔹 English:", generate_response(english_prompt))
Expected Output Example 1:
🔹 Arabic: أكبر مسجد في الإمارات العربية المتحدة هو مسجد الشيخ زايد في أبوظبي.
🔹 English: The largest mosque in the UAE is Sheikh Zayed Grand Mosque in Abu Dhabi.
Step 14: If Model Offloads to CPU (Low VRAM Issue)
If you see the meta device warning, your GPU does not have enough memory. Try offloading part of the model to the CPU:
model = AutoModelForCausalLM.from_pretrained(
model_path,
torch_dtype=torch.float16,
device_map={"": "cuda:0"}, # Forces primary GPU
offload_folder="./offload" # Offloads excess parameters to CPU
)
Step 15: Use bitsandbytes
to Reduce VRAM Usage (Extreme Cases)
If you’re running out of VRAM, install bitsandbytes
to quantize the model:
pip install bitsandbytes accelerate
Then, load with 4-bit quantization:
from transformers import BitsAndBytesConfig
quantization_config = BitsAndBytesConfig(
load_in_4bit=True, # Enables 4-bit quantization
bnb_4bit_compute_dtype=torch.float16,
bnb_4bit_use_double_quant=True
)
model = AutoModelForCausalLM.from_pretrained(
model_path,
quantization_config=quantization_config,
device_map="auto"
)
This reduces VRAM consumption by 50%.
Step 16: Final Fixes
Check GPU memory usage:
print(torch.cuda.memory_summary(device="cuda"))
- Use
device_map="auto"
for multi-GPU setups.
- Try 4-bit quantization (
bitsandbytes
) if you have VRAM issues.
- If using CPU offloading, make sure to set
offload_folder
properly.
Step 17: Deploy an Interactive Chatbot on Jupyter Notebook(Optional)
If you want an interactive chatbot UI, use Gradio.
Run the following command to install the gradio:
!pip install gradio
Step 18: Run Gradio Chatbot
Execute the following command to run the gradio chatbot:
import gradio as gr
def chatbot_response(prompt):
return generate_response(prompt)
gr.Interface(
fn=chatbot_response,
inputs=gr.Textbox(lines=2, placeholder="Type your message..."),
outputs="text",
title="Jais-Adapted-70B Chatbot",
live=True,
).launch(share=True)
This will generate a Gradio link for your chatbot!
Step 19: Access Chatbot
Access the Chatbot on:
Running on local URL: http://127.0.0.1:7860
Running on public URL: https://f1c6126a7caf137784.gradio.live
You can also access chatbot in Jupyter Notebook.
Note: This is a step-by-step guide for interacting with your model. It covers the first method for installing Jais-Adapted-70b locally using jupyter notebook and transformers.
The Jais-Adapted model is also available on Ollama in its 7B version, providing a robust bilingual language model optimized for both Arabic and English. This version maintains strong language processing capabilities while being lightweight enough for efficient deployment. Users can easily access and run the model with the command ollama run jwnder/jais-adaptive:7b
, making it a convenient option for those looking to integrate advanced language understanding into their workflows. We also tried this version and found it to be efficient and responsive, making it a great choice for a wide range of applications. With its availability on Ollama, the Jais-Adapted 7B model is more accessible than ever for researchers, developers, and businesses.
Option 2: Using Ollama (Terminal)
Prerequisites for Installing Jais-Adapted-70b Model Locally using Ollama
- GPU:
- Memory (VRAM):
- Minimum: 16GB (with 8-bit or 4-bit quantization).
- Recommended: 24GB for smoother execution.
- Optimal: 48GB for full performance at FP16 precision.
- Type: NVIDIA GPUs with Tensor Cores (e.g., RTX 4090, A6000, A100, H100).
- Disk Space:
- Minimum: 40GB free SSD storage.
- Recommended: 100GB SSD for storing additional checkpoints, logs, and datasets.
- RAM:
- Minimum: 24GB.
- Recommended: 48GB for smoother operation, especially with large datasets.
- CPU:
- Minimum: 16 cores.
- Recommended: 24-48 cores for fast data preprocessing and I/O operations.
- Install Ollama: Download and install the Ollama tool from the official site.
- Serve Ollama: Run the Ollama server.
- Pull the Model: Run the following command to pull the desired model:
ollama pull jwnder/jais-adaptive:7b
- Run the Model: Start the model in the terminal:
ollama run jwnder/jais-adaptive:7b
Option 3: Using Open WebUI
- Set Up Open WebUI:
Follow our Open WebUI Setup Guide to configure the interface. Ensure all dependencies are installed and the environment is correctly set up.
- Refresh the Interface:
Confirm that the jais-adaptive model has been downloaded and is visible in the list of available models on the Open WebUI.
- Select Your Model:
Choose the jais-adaptive model from the list.
- Start Interaction:
Begin using the model by entering your queries in the interface.
Conclusion
The Jais-Adapted-70B model stands out as a highly capable bilingual language model, offering strong performance in both Arabic and English. With its optimized architecture and extensive training, it delivers accurate text generation, making it a valuable tool for research, content creation, and interactive applications. The step-by-step guide ensures a smooth setup process, whether using a virtual machine, Jupyter Notebook, or deployment through Ollama. The availability of a 7B version on Ollama further enhances accessibility, allowing users to run the model efficiently on a wider range of hardware. With its strong linguistic capabilities and adaptability, Jais-Adapted-70B provides a reliable solution for bilingual text processing across various domains.