DeepSeek-VL2 is a powerful vision-language model designed to handle a wide range of visual and text-based tasks, including visual question answering, optical character recognition, document analysis, and object localization. It builds on a Mixture-of-Experts (MoE) architecture, offering efficient processing and improved accuracy.
The model series includes three versions—DeepSeek-VL2-Tiny, DeepSeek-VL2-Small, and DeepSeek-VL2—with varying numbers of activated parameters to suit different use cases. DeepSeek-VL2 is optimized for accuracy while maintaining efficiency, making it a strong choice for complex multimodal tasks. It supports commercial use and is available under the MIT License.
Resource
HuggingFace
Link: https://huggingface.co/deepseek-ai/deepseek-vl2
GitHub
Link: https://github.com/deepseek-ai/DeepSeek-VL2
1. GPU Requirements
Model Variant | VRAM Requirement (Inference) | VRAM Requirement (Gradio) | Recommended GPU |
---|
DeepSeek-VL2-Tiny (1.0B params) | 16GB (8-bit quantization) | 24GB | RTX 3090 / 4090 / A5000 |
DeepSeek-VL2-Small (2.8B params) | 40GB (Incremental Prefilling) | 48GB+ | A100 40GB / A6000 |
DeepSeek-VL2 (4.5B params) | 80GB (Full Performance) | 80GB+ | RTX A 6000 /A100 80GB / H100 |
- Minimum: 16GB VRAM (for Tiny variant with quantization).
- Recommended: 48GB VRAM for smooth execution of DeepSeek-VL2-Small.
- Optimal: 80GB VRAM for full performance of DeepSeek-VL2 and high-resolution Gradio demos.
- GPU Type: NVIDIA GPUs with Tensor Cores (e.g., RTX 4090, A6000, A100, H100).
For multiple image inference or Gradio-based interactive UI, 48GB+ VRAM is recommended.
2. CPU Requirements
Component | Minimum | Recommended |
---|
CPU Cores | 16 cores | 32+ cores |
Clock Speed | 2.5 GHz | 3.5 GHz+ |
Processor Type | AMD EPYC / Intel Xeon | AMD Threadripper / Intel Xeon Platinum |
- Multimodal tasks require efficient CPU preprocessing, especially when handling images, charts, and documents.
3. RAM Requirements
Task Type | Minimum RAM | Recommended RAM |
---|
Text-only tasks | 16GB | 32GB |
Text + Image | 32GB | 64GB |
Text + Multiple Images / Gradio UI | 64GB | 128GB |
- Minimum: 32GB RAM (for text and single-image processing).
- Recommended: 64GB+ RAM (for multiple images and longer context window).
- Optimal: 128GB RAM (for Gradio UI with multi-image or complex visual grounding tasks).
4. Disk Space & Storage
Component | Minimum | Recommended |
---|
Disk Space | 50GB SSD | 200GB NVMe SSD |
Disk Type | SATA SSD | NVMe SSD |
- Minimum: 50GB free storage for model weights and inference scripts.
- Recommended: 200GB SSD for storing datasets, checkpoints, logs, and texture assets.
Use NVMe SSD to reduce model load time.
5. Network Requirements
Component | Minimum | Recommended |
---|
Internet Speed | 100 Mbps | 1 Gbps+ |
Cloud VM | Any GPU VM | Cloud GPUs (A100/H100) |
- If running DeepSeek-VL2 on a cloud VM, ensure high-speed networking (1 Gbps) for fast model downloads and dataset handling.
6. Optimizations for Gradio UI
- Reduce batch size for image processing to optimize VRAM.
- Use mixed precision (
bfloat16
) for faster performance.
- Enable memory-efficient attention (
flash_attention
for scaling).
- Deploy on multiple GPUs for better parallelism.
7. Best Practices for Performance
- Use SSD/NVMe for storage – Avoid HDDs for model loading.
- Monitor GPU Usage – Run
nvidia-smi
to check VRAM usage.
- Enable Flash Attention – For efficient memory handling.
- Use Incremental Prefilling – Reduces GPU memory usage.
- Multi-GPU Scaling – Ideal for parallel image processing.
8. Summary: Recommended System Build
Component | Recommended Specification |
---|
GPU | NVIDIA A6000 (48GB) / A100 (80GB) / H100 (80GB) |
CPU | AMD EPYC 64-core / Intel Xeon 32-core |
RAM | 64GB (Image tasks) / 128GB (Gradio UI) |
Storage | 200GB NVMe SSD |
Network | 1 Gbps Cloud VM (for cloud hosting) |
Step-by-Step Process to Install DeepSeek VL2 Small – MoE Vision Model Locally
For the purpose of this tutorial, we will use a GPU-powered Virtual Machine offered by NodeShift; however, you can replicate the same steps with any other cloud provider of your choice. NodeShift provides the most affordable Virtual Machines at a scale that meets GDPR, SOC2, and ISO27001 requirements.
Step 1: Sign Up and Set Up a NodeShift Cloud Account
Visit the NodeShift Platform and create an account. Once you’ve signed up, log into your account.
Follow the account setup process and provide the necessary details and information.
Step 2: Create a GPU Node (Virtual Machine)
GPU Nodes are NodeShift’s GPU Virtual Machines, on-demand resources equipped with diverse GPUs ranging from H100s to A100s. These GPU-powered VMs provide enhanced environmental control, allowing configuration adjustments for GPUs, CPUs, RAM, and Storage based on specific requirements.
Navigate to the menu on the left side. Select the GPU Nodes option, create a GPU Node in the Dashboard, click the Create GPU Node button, and create your first Virtual Machine deployment.
Step 3: Select a Model, Region, and Storage
In the “GPU Nodes” tab, select a GPU Model and Storage according to your needs and the geographical region where you want to launch your model.
We will use 1x RTX A6000 GPU for this tutorial to achieve the fastest performance. However, you can choose a more affordable GPU with less VRAM if that better suits your requirements.
Step 4: Select Authentication Method
There are two authentication methods available: Password and SSH Key. SSH keys are a more secure option. To create them, please refer to our official documentation.
Step 5: Choose an Image
Next, you will need to choose an image for your Virtual Machine. We will deploy DeepSeek VL2 Small – MoE Vision on an NVIDIA Cuda Virtual Machine. This proprietary, closed-source parallel computing platform will allow you to install DeepSeek VL2 Small – MoE Vision on your GPU Node.
After choosing the image, click the ‘Create’ button, and your Virtual Machine will be deployed.
Step 6: Virtual Machine Successfully Deployed
You will get visual confirmation that your node is up and running.
Step 7: Connect to GPUs using SSH
NodeShift GPUs can be connected to and controlled through a terminal using the SSH key provided during GPU creation.
Once your GPU Node deployment is successfully created and has reached the ‘RUNNING’ status, you can navigate to the page of your GPU Deployment Instance. Then, click the ‘Connect’ button in the top right corner.
Now open your terminal and paste the proxy SSH IP or direct SSH IP.
Next, if you want to check the GPU details, run the command below:\
nvidia-smi
Step 8: Check the Available Python version and Install the new version
Run the following commands to check the available Python version.
If you check the version of the python, system has Python 3.8.1 available by default. To install a higher version of Python, you’ll need to use the deadsnakes
PPA.
Run the following commands to add the deadsnakes
PPA:
sudo apt update
sudo apt install -y software-properties-common
sudo add-apt-repository -y ppa:deadsnakes/ppa
sudo apt update
Step 9: Install Python 3.11
Now, run the following command to install Python 3.11 or another desired version:
sudo apt install -y python3.11 python3.11-distutils python3.11-venv
Step 10: Update the Default Python3
Version
Now, run the following command to link the new Python version as the default python3
:
sudo update-alternatives --install /usr/bin/python3 python3 /usr/bin/python3.8 1
sudo update-alternatives --install /usr/bin/python3 python3 /usr/bin/python3.11 2
sudo update-alternatives --config python3
Then, run the following command to verify that the new Python version is active:
python3 --version
Step 11: Install and Update Pip
Run the following command to install and update the pip:
python3 -m ensurepip --upgrade
python3 -m pip install --upgrade pip
Then, run the following command to check the version of pip:
pip --version
Step 12: Clone the Repository
Run the following command to clone the Deepseek-vl2 repository:
git clone https://github.com/deepseek-ai/deepseek-vl2.git
cd deepseek-vl2
Step 13: Setup Environment
Run the following command to setup the environment:
python -m venv deepseek_env
source deepseek_env/bin/activate
# On Windows: deepseek_env\Scripts\activate
Step 14: Install Dependencies
Run the following command to install the dependencies:
pip install -e .
Step 15: Install Gradio
Run the following command to install the Gradio:
pip install gradio==3.48.0
Step 16: Check Model and Commands
The repository provides example commands to run the web demo using different model variants. Note that you should set the CUDA_VISIBLE_DEVICES
environment variable to the GPU you wish to use (in this example, GPU 2
is used) and specify the appropriate model name, port, and (if needed) the --chunk_size
parameter.
1. For the VL2-Tiny Model
- Model Details:
- Total parameters: 3.37B MoE
- Activated parameters: 1B
- Suitable for a single GPU with less than 40GB memory
CUDA_VISIBLE_DEVICES=2 python web_demo.py \
--model_name "deepseek-ai/deepseek-vl2-tiny" \
--port 37914
2. For the VL2-Small Model
- Model Details:
- Total parameters: 16.1B MoE
- Activated parameters: 2.4B
- Memory Note:
- When running on an A100 40GB GPU, you should set
--chunk_size 512
to save memory via incremental prefilling (at the expense of speed).
- On GPUs with more than 40GB, you can omit the
--chunk_size 512
for a faster response.
- Command (for a 40GB GPU):
CUDA_VISIBLE_DEVICES=2 python web_demo.py \
--model_name "deepseek-ai/deepseek-vl2-small" \
--port 37914 \
--chunk_size 512
3. For the VL2 (Full) Model
- Model Details:
- Total parameters: 27.5B MoE
- Activated parameters: 4.2B
- Command:
CUDA_VISIBLE_DEVICES=2 python web_demo.py \
--model_name "deepseek-ai/deepseek-vl2" \
--port 37914
How to Use These Commands
- Set the GPU:
The CUDA_VISIBLE_DEVICES=2
part tells the system to use GPU number 2. Adjust this value according to your system’s GPU configuration.
- Run the Demo Script:
The python web_demo.py
command launches the Gradio-based web demo.
- Specify the Model Variant:
Use the --model_name
parameter to choose between the different model variants:
"deepseek-ai/deepseek-vl2-tiny"
"deepseek-ai/deepseek-vl2-small"
"deepseek-ai/deepseek-vl2"
- Set the Port:
The --port 37914
argument sets the port on which the web server will run. Open your browser and navigate to http://<your_server_ip>:37914
to access the demo.
- Optional Memory Tuning:
For the small model on a GPU with 40GB memory, the additional --chunk_size 512
argument is recommended for memory-saving incremental pre-filling.
Step 17: Verify Your GPU Availability
Run the following command in your terminal to see if your GPU is recognized by the system:
nvidia-smi
Step 18: Run Deepseek-vl2-tiny Model
Execute the following command to run the deepseek-vl2-tiny model:
python3 web_demo.py --model_name "deepseek-ai/deepseek-vl2-tiny" --port 37914
Step 19: Access the Application
Accessing the application at:
Running on local URL: http://0.0.0.0:37914
Running on public URL: https://8df6de5304350b2ecc.gradio.live
Step 20: Play with Deepseek-vl2-tiny Model
Step 21: Run Deepseek-vl2-small Model
Execute the following command to run the deepseek-vl2-small model:
CUDA_VISIBLE_DEVICES=0 python3 web_demo.py --model_name "deepseek-ai/deepseek-vl2-small" --port 37914 --chunk_size 512
Step 22: Access the Application
Accessing the application at:
Running on local URL: http://0.0.0.0:37914
Running on public URL: https://8df6de5304350b2ecc.gradio.live
Step 23: Play with Deepseek-vl2-small Model
Step 24: Run Deepseek-vl2 Model
Execute the following command to run the deepseek-vl2 model:
CUDA_VISIBLE_DEVICES=0 python web_demo.py --model_name "deepseek-ai/deepseek-vl2" --port 37914
Step 25: Access the Application
Accessing the application at:
Running on local URL: http://0.0.0.0:37914
Running on public URL: https://8df6de5304350b2ecc.gradio.live
Step 26: Play with Deepseek-vl2 Model
- For Inference Only: DeepSeek-VL2-Tiny can run on a 16GB GPU with quantization, but the full model requires 80GB VRAM.
- For Gradio Deployment: At least 48GB VRAM is required for multi-image handling, and 80GB VRAM is ideal for full-scale applications.
- Optimization Strategies:
- Chunked Inference (for 40GB GPUs).
- Flash Attention (for efficient multi-image processing).
- Quantization (for limited VRAM GPUs).
Deploy DeepSeek-VL2 on the right hardware for best performance! 🚀
Note: This is a step-by-step guide for interacting with your models. It covers the first method for installing Tiny, Small, and VL2 models using the Gradio interface. If you want to run these models with inference, please follow the steps below:
Step 1: Running a Simple Inference Example
Use the provided sample code to test the model. Create a Python script (for example, inference_example.py
) with the following content:
import torch
from transformers import AutoModelForCausalLM
from deepseek_vl.models import DeepseekVLV2Processor, DeepseekVLV2ForCausalLM
from deepseek_vl.utils.io import load_pil_images
# Specify the model path – you can choose among the variants (tiny, small, or full)
model_path = "deepseek-ai/deepseek-vl2-small"
# Load the processor (includes the tokenizer)
vl_chat_processor = DeepseekVLV2Processor.from_pretrained(model_path)
tokenizer = vl_chat_processor.tokenizer
# Load the model; adjust torch precision and device as needed
vl_gpt = DeepseekVLV2ForCausalLM.from_pretrained(model_path, trust_remote_code=True)
vl_gpt = vl_gpt.to(torch.bfloat16).cuda().eval()
## Single image conversation example:
conversation = [
{
"role": "<|User|>",
"content": "<image>\n<|ref|>The giraffe at the back.<|/ref|>.",
"images": ["./images/visual_grounding.jpeg"],
},
{"role": "<|Assistant|>", "content": ""}
]
# Load images and prepare inputs
pil_images = load_pil_images(conversation)
prepare_inputs = vl_chat_processor(
conversations=conversation,
images=pil_images,
force_batchify=True,
system_prompt=""
).to(vl_gpt.device)
# Generate image embeddings using the model’s image encoder
inputs_embeds = vl_gpt.prepare_inputs_embeds(**prepare_inputs)
# Generate the response from the language model
outputs = vl_gpt.language_model.generate(
inputs_embeds=inputs_embeds,
attention_mask=prepare_inputs.attention_mask,
pad_token_id=tokenizer.eos_token_id,
bos_token_id=tokenizer.bos_token_id,
eos_token_id=tokenizer.eos_token_id,
max_new_tokens=512,
do_sample=False,
use_cache=True
)
# Decode and print the answer
answer = tokenizer.decode(outputs[0].cpu().tolist(), skip_special_tokens=True)
print(f"Assistant: {answer}")
Step 2: Run the Inference Example:
In your terminal (with the virtual environment activated), run:
python inference_example.py
If everything is set up correctly, the model will process the sample conversation and output the generated answer.
Conclusion
In this guide, we explored DeepSeek-VL2, a powerful vision-language model designed for advanced multimodal understanding. We provided a detailed step-by-step tutorial on setting up DeepSeek-VL2 on a GPU-powered virtual machine using NodeShift, covering hardware requirements, installation steps, and optimization strategies. Additionally, we demonstrated how to deploy and interact with the model using the Gradio UI and simple inference scripts. By following this guide, you’ve learned how to install dependencies, configure your environment, and run DeepSeek-VL2 efficiently. Whether for document analysis, visual question answering, or multi-image tasks, DeepSeek-VL2 offers a robust solution for complex vision-language applications.