How to Install Aya Vision 8B Locally?

by Ayush Kumar | March 19, 2025

Ready to build cheaper?

Custom CPU plans from as little as $0.012/hour.

Aya Vision 8B is a powerful multilingual vision-language model designed to handle image and text-based tasks with high accuracy. With 8 billion parameters, it excels in image captioning, optical character recognition (OCR), visual reasoning, summarization, and question answering across 23 languages, including English, French, Spanish, German, Chinese, Arabic, and Hindi. The model efficiently processes both images and text using a SigLIP2 vision encoder paired with the C4AI Command R7B language model, ensuring seamless integration of visual and textual data. Its ability to handle 16K tokens makes it suitable for long-form content generation and in-depth analysis. Aya Vision 8B is optimized for scene understanding, document processing, multilingual transcription, and AI-driven research, providing structured and context-aware responses for a wide range of applications.

Model Resource

Hugging Face

Link: https://huggingface.co/CohereForAI/aya-vision-8b

Prerequisites for Installing Aya Vision 8B Model Locally

GPU:
- Memory (VRAM):
  - Minimum: 16GB (with 8-bit or 4-bit quantization).
  - Recommended: 24GB for smoother execution.
  - Optimal: 48GB for full performance at FP16 precision.
- Type: NVIDIA GPUs with Tensor Cores (e.g., RTX 4090, A6000, A100, H100).
Disk Space:
- Minimum: 40GB free SSD storage.
- Recommended: 100GB SSD for storing additional checkpoints, logs, and datasets.
RAM:
- Minimum: 24GB.
- Recommended: 48GB for smoother operation, especially with large datasets.
CPU:
- Minimum: 16 cores.
- Recommended: 24-48 cores for fast data preprocessing and I/O operations.

Step-by-Step Process to Install Aya Vision 8B Model Locally

For the purpose of this tutorial, we will use a GPU-powered Virtual Machine offered by NodeShift; however, you can replicate the same steps with any other cloud provider of your choice. NodeShift provides the most affordable Virtual Machines at a scale that meets GDPR, SOC2, and ISO27001 requirements.

Step 1: Access model from Hugging Face

Link: https://huggingface.co/CohereForAI/aya-vision-8b

You need to agree to share your contact information to access this model. Fill in all the mandatory details, such as your name and email, and then wait for approval from Hugging Face and Google to gain access and use the model.

You will be granted access to this model within an hour, provided you have filled in all the details correctly.

Step 2: Sign Up and Set Up a NodeShift Cloud Account

Visit the NodeShift Platform and create an account. Once you’ve signed up, log into your account.

Follow the account setup process and provide the necessary details and information.

Step :3 Create a GPU Node (Virtual Machine)

GPU Nodes are NodeShift’s GPU Virtual Machines, on-demand resources equipped with diverse GPUs ranging from H100s to A100s. These GPU-powered VMs provide enhanced environmental control, allowing configuration adjustments for GPUs, CPUs, RAM, and Storage based on specific requirements.

Navigate to the menu on the left side. Select the GPU Nodes option, create a GPU Node in the Dashboard, click the Create GPU Node button, and create your first Virtual Machine deploy

Step 4: Select a Model, Region, and Storage

In the “GPU Nodes” tab, select a GPU Model and Storage according to your needs and the geographical region where you want to launch your model.

We will use 1x RTX A6000 GPU for this tutorial to achieve the fastest performance. However, you can choose a more affordable GPU with less VRAM if that better suits your requirements.

Step 5: Select Authentication Method

There are two authentication methods available: Password and SSH Key. SSH keys are a more secure option. To create them, please refer to our official documentation.

Step 6: Choose an Image

Next, you will need to choose an image for your Virtual Machine. We will deploy Aya Vision 8B on an NVIDIA Cuda Virtual Machine. This proprietary, closed-source parallel computing platform will allow you to install Aya Vision 8B on your GPU Node.

After choosing the image, click the ‘Create’ button, and your Virtual Machine will be deployed.

Step 7: Virtual Machine Successfully Deployed

You will get visual confirmation that your node is up and running.

Step 8: Connect to GPUs using SSH

NodeShift GPUs can be connected to and controlled through a terminal using the SSH key provided during GPU creation.

Once your GPU Node deployment is successfully created and has reached the ‘RUNNING’ status, you can navigate to the page of your GPU Deployment Instance. Then, click the ‘Connect’ button in the top right corner.

Now open your terminal and paste the proxy SSH IP or direct SSH IP.

Next, if you want to check the GPU details, run the command below:

nvidia-smi

Step 9: Install Required Dependencies

Run the following command to install required dependencies and libraries:

sudo apt update && sudo apt upgrade -y

sudo apt install -y git python3 python3-pip python3-venv libsndfile1 ffmpeg libgl1-mesa-glx libglib2.0-0

Step 10: Set Up a Python Virtual Environment

Run the following commands to set up a python virtual environment:

# Create virtual environment
python3 -m venv aya_env
source aya_env/bin/activate

Step 11: Install Python Dependencies

Run the following commands to install the Python dependencies:

pip install --upgrade pip
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
pip install transformers accelerate diffusers huggingface_hub

Step 12: Login Using Your Hugging Face API Token

Use the huggingface_hub cli to login directly in the terminal.

Run the following command to login in huggingface-cli:

huggingface-cli login

Then, enter the token and press the Enter key. Ensure you press Enter after entering the token so the input will not be visible.

After entering the token, you will see the following output:
Login Successful.
The current active token is (your_token_name).

Check the screenshot below for reference.

How to Generate a Hugging Face Token

Create an Account: Go to the Hugging Face website and sign up for an account if you don’t already have one.
Access Settings: After logging in, click on your profile photo in the top right corner and select “Settings.”
Navigate to Access Tokens: In the settings menu, find and click on the “Access Tokens” tab.
Generate a New Token: Click the “New token” button, provide a name for your token, and choose a role (either read or write).
Generate and Copy Token: Click the “Generate a token” button. Your new token will appear; click “Show” to view it and copy it for use in your applications.
Secure Your Token: Ensure you keep your token secure and do not expose it in public code repositories.

Step 1:3 Create a Python Script

You are connecting your GPU remote server to VS Code and creating a test.py file for running the Aya Vision 8B model. Follow these steps:

Install VS Code Extensions

On your local machine, open VS Code and install:

Remote – SSH extension
Python extension

Steps:

Open VS Code.
Click on Extensions (Ctrl + Shift + X).
Search for “Remote – SSH” and install it.
Search for “Python” and install it.

Connect VS Code to Your GPU Remote Server

Steps to Connect via SSH

Open VS Code.
Press Ctrl + Shift + P to open the command palette.
Type “Remote-SSH: Connect to Host…” and select it.
Enter your GPU server details:

ssh root@<YOUR_GPU_SERVER_IP>

Example:

ssh root@192.168.1.100

Enter your password (or use your SSH key if set up).
Now you are inside your remote GPU server via VS Code!

Create test.py File in VS Code

Now, in VS Code, inside your remote connection:

Open the File Explorer in VS Code.
Navigate to your remote GPU directory (~/aya_env).
Create a new file named test.py.
Copy and paste the following test code inside test.py:

from transformers import AutoProcessor, AutoModelForImageTextToText
import torch

# Load the Aya Vision 8B Model
model_id = "CohereForAI/aya-vision-8b"

processor = AutoProcessor.from_pretrained(model_id)
model = AutoModelForImageTextToText.from_pretrained(
    model_id, device_map="auto", torch_dtype=torch.float16
)

# Define an image URL for testing
messages = [
    {"role": "user",
     "content": [
       {"type": "image", "url": "https://pbs.twimg.com/media/Fx7YvfQWYAIp6rZ?format=jpg&name=medium"},
       {"type": "text", "text": "What is written in the image?"}
    ]}
]

# Prepare input
inputs = processor.apply_chat_template(
    messages, padding=True, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt"
).to(model.device)

# Generate response
gen_tokens = model.generate(
    **inputs, 
    max_new_tokens=300, 
    do_sample=True, 
    temperature=0.3,
)

# Print output
print(processor.tokenizer.decode(gen_tokens[0][inputs.input_ids.shape[1]:], skip_special_tokens=True))

Step 14: Run Server

Now, run the script on terminal:

python3 test.py

Output:

Conclusion

Aya Vision 8B is a highly capable vision-language model designed to process and analyze both images and text with precision. With its multilingual support across 23 languages and robust visual reasoning abilities, it excels in tasks such as image captioning, document processing, and question answering. This guide provided a step-by-step approach to setting up the model on a GPU-powered virtual machine, ensuring optimal performance for users working with structured visual data. By following this installation process, researchers, developers, and content creators can seamlessly integrate Aya Vision 8B into their workflows, enhancing automation and efficiency in various vision-language applications.

Relevant blog posts

June 11, 2025

How to Install Mistral Magistral Locally?

Magistral-Small-2506 is the latest evolution in Mistral AI’s line of efficient reasoning models, fine-tuned from the base Mistral-Small-3.1-2503. While it retains its compact size and agility, this model steps into deeper waters—bringing long-form reasoning, step-by-step deduction, and multilingual support into a package that runs comfortably on consumer-grade GPUs. What makes Magistral stand out is its clarity of thought. The model doesn’t just answer questions—it takes time to think. It writes out its reasoning process like a person solving a math problem on paper, making it especially useful for logic, science, code, and educational tasks. Whether you’re building a chatbot that needs to explain itself or a backend service for research-style outputs, Magistral-Small brings structure and depth with minimal overhead.

June 10, 2025

How to Install NVIDIA Nemotron-Research-Reasoning-Qwen-1.5B Locally?

Nemotron-Research-Reasoning-Qwen-1.5B is a compact powerhouse built for solving complex reasoning tasks across math, code, science, and logic. Developed by NVIDIA, it’s designed to think through problems the way a sharp student would—step by step, carefully, and with clarity. From tackling Olympiad-style math puzzles to debugging code and breaking down scientific explanations, this model punches far above its size. It’s the result of deep training across diverse and challenging topics, making it ideal for research, development, and anyone curious about how far small models can go when taught to think smart—not just big. The leading generalist reasoning model for cutting-edge research and development .

June 9, 2025

How to Install WebThinker-QwQ-32B Locally?

WebThinker-QwQ-32B is a large-scale reasoning model designed to mimic human research processes. With 32 billion parameters, it autonomously navigates the web, clicking links and interacting with pages to gather information. It can draft research reports while exploring, integrating real-time knowledge acquisition with writing. Trained using reinforcement learning techniques, it optimizes its performance through iterative feedback loops, making it ideal for complex problem-solving and open-ended tasks requiring external research.

See all posts

Ready to build
with us?

The ideal way for organizations young and old to ease their way into the distributed and affordable cloud at their own pace.

Stay Tuned!

Stay up to date with the latest updates, news, and hotfixes for our product.

NodeShift creates a vital link between developers and affordable cloud.

Switch theme

English (EN)
Arabic (AR)
Chinese (ZH-CN)
German (DE)
Korean (KO)
Russian (RU)
French (FR)
Spanish (ES)
Portuguese (PT)
Japanese (JA)

JavaScript is disabled in your browser. For a better experience, please enable JavaScript.Learn how to enable JavaScript.