How to Install ByteDance Dolphin Locally?

by Ayush Kumar | June 30, 2025

Ready to build cheaper?

Custom CPU plans from as little as $0.012/hour.

Dolphin is a powerful tool that reads and understands document images — whether it’s a scanned PDF, a handwritten formula, or a complex layout with tables and figures. It works in two smart steps: first, it analyzes the full structure of the page (like how we read top to bottom, left to right), then it breaks down each element (like a paragraph or equation) and makes sense of it in parallel. What makes Dolphin stand out is how lightweight and fast it is, while still handling all the messy, real-world formats we throw at it — making it perfect for researchers, developers, and document-heavy workflows.

Resources

HuggingFace

Link: https://huggingface.co/ByteDance/Dolphin

GitHub

Link: https://github.com/bytedance/Dolphin

GPU Configuration Table for ByteDance Dolphin

GPU Model	vCPUs	RAM (GB)	VRAM (GB)	Precision	Recommended Use Case
RTX A6000	48	45	48	FP16/BF16	Full-speed parsing of multi-page PDFs and high-res document images
A100 40GB	96	90	40	FP16/BF16	Batch inference, element-level parsing at scale
T4	16	16	16	INT8/FP16	Light document parsing, individual element decoding
L4	24	24	24	FP16	Ideal for demo runs and Hugging Face model inference
V100 32GB	32	60	32	FP16	Balanced performance for PDF + image parsing

Step-by-Step Process to Install ByteDance Dolphin Locally

For the purpose of this tutorial, we will use a GPU-powered Virtual Machine offered by NodeShift; however, you can replicate the same steps with any other cloud provider of your choice. NodeShift provides the most affordable Virtual Machines at a scale that meets GDPR, SOC2, and ISO27001 requirements.

Step 1: Sign Up and Set Up a NodeShift Cloud Account

Visit the NodeShift Platform and create an account. Once you’ve signed up, log into your account.

Follow the account setup process and provide the necessary details and information.

Step 2: Create a GPU Node (Virtual Machine)

GPU Nodes are NodeShift’s GPU Virtual Machines, on-demand resources equipped with diverse GPUs ranging from H100s to A100s. These GPU-powered VMs provide enhanced environmental control, allowing configuration adjustments for GPUs, CPUs, RAM, and Storage based on specific requirements.

Navigate to the menu on the left side. Select the GPU Nodes option, create a GPU Node in the Dashboard, click the Create GPU Node button, and create your first Virtual Machine deploy

Step 3: Select a Model, Region, and Storage

In the “GPU Nodes” tab, select a GPU Model and Storage according to your needs and the geographical region where you want to launch your model.

We will use 1 x RTXA6000 GPU for this tutorial to achieve the fastest performance. However, you can choose a more affordable GPU with less VRAM if that better suits your requirements.

Step 4: Select Authentication Method

There are two authentication methods available: Password and SSH Key. SSH keys are a more secure option. To create them, please refer to our official documentation.

Step 5: Choose an Image

Next, you will need to choose an image for your Virtual Machine. We will deploy ByteDance Dolphin on an NVIDIA Cuda Virtual Machine. This proprietary, closed-source parallel computing platform will allow you to install ByteDance Dolphin on your GPU Node.

After choosing the image, click the ‘Create’ button, and your Virtual Machine will be deployed.

Step 6: Virtual Machine Successfully Deployed

You will get visual confirmation that your node is up and running.

Step 7: Connect to GPUs using SSH

NodeShift GPUs can be connected to and controlled through a terminal using the SSH key provided during GPU creation.

Once your GPU Node deployment is successfully created and has reached the ‘RUNNING’ status, you can navigate to the page of your GPU Deployment Instance. Then, click the ‘Connect’ button in the top right corner.

Now open your terminal and paste the proxy SSH IP or direct SSH IP.

Next, If you want to check the GPU details, run the command below:

nvidia-smi

Step 8: Install Miniconda & Packages

After completing the steps above, install Miniconda.

Miniconda is a free minimal installer for conda. It allows the management and installation of Python packages.

Anaconda has over 1,500 pre-installed packages, making it a comprehensive solution for data science projects. On the other hand, Miniconda allows you to install only the packages you need, reducing unnecessary clutter in your environment.

We highly recommend installing Python using Miniconda. Miniconda comes with Python and a small number of essential packages. Additional packages can be installed using the package management systems Mamba or Conda.

For Linux/macOS:

Download the Miniconda installer script:

sudo apt update && apt install wget -y
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh

For Windows:

Download the Windows Miniconda installer from the official website.
Run the installer and follow the installation prompts

Run the installer script:

bash Miniconda3-latest-Linux-x86_64.sh

After Installing Miniconda, you will see the following message:

Thank you for installing Miniconda 3! This means Miniconda is installed in your working directory or on your operating system.

Check the screenshot below for proof:

Step 9: Activate Conda and Create a Environment

After the installation process, activate Conda using the following command:

export PATH="/root/miniconda3/bin:$PATH"
conda init
exec "$SHELL"

Create a Conda Environment using the following command:

conda create -n bytedance python=3.11 -y
conda activate flux

conda create: This is the command to create a new environment.
-n bytedance: The -n flag specifies the name of the environment you want to create. Here bytedance is the name of the environment you’re creating. You can name it anything you like.
python=3.11: This specifies the version of Python that you want to install in the new environment. In this case, it’s Python 3.11.
-y: This flag automatically answers “yes” to all prompts during the creation process, so the environment is created without asking for further confirmation.

Step 10: Clone the ByteDance Dolphin Repository

Run the following commands to clone the flux repository:

git clone https://github.com/bytedance/Dolphin.git
cd dolphin

Step 11: Install the Required Python Dependencies

After cloning the repository, install all dependencies listed in the requirements.txt file using:

pip install -r requirements.txt

This command will install essential packages such as:

numpy
opencv-python
opencv-python-headless
Pillow
timm
torch
torchvision

Step 12: Install HuggingFace Hub

Run the following command to install huggingface_hub:

pip install huggingface_hub

Step 13: Login to Hugging Face

Run the following command to use the CLI to authenticate:

huggingface-cli login

This will ask for your Hugging Face token.

You can generate your token here:
https://huggingface.co/settings/tokens

Use a read access token, copy it, and paste it in the terminal prompt.

Step 14: Download the Pretrained ByteDance Dolphin Model from Hugging Face

Run the following command to download the pretrained ByteDance Dolphin Model from Hugging Face:

huggingface-cli download ByteDance/Dolphin --local-dir ./hf_model

This will save all model files into the ./hf_model directory, which will later be used for inference.

Step 15: Install OpenCV Dependency (`libGL`)

Some Dolphin components (like image rendering with OpenCV) require libGL.so.1, which is missing by default in many Ubuntu environments. You can fix this by installing libgl1.

Run the following command:

sudo apt update && sudo apt install -y libgl1

This will install libGL and all its required dependencies like:

libdrm, libx11, libxcb, mesa, and vulkan drivers

Step 16: Run Inference with the Dolphin Model

Now that everything is set up, you can run inference on a sample document image using the Dolphin model.

Run the following command:

python demo_page_hf.py --model_path ./hf_model --input_path ./demo/page_imgs/page_1.jpeg --save_dir ./results

What this does:

Loads the model from ./hf_model
Takes the input image page_1.jpeg from ./demo/page_imgs/
Saves the output predictions to the ./results folder

Processing completed. Results saved to ./results

Step 17: View Output Results in Markdown Format

Once the Dolphin model has processed the input image, the results are saved as a .md (Markdown) file inside the results/ directory.

In this case, the file page_1.md contains the structured text output extracted from the image.

You can open and preview it using:

results/page_1.md

The output includes:

Markdown headers (#, ##) for sections like abstracts and introductions
Formatted paragraphs
Detected titles, authors, and document structure from the input image

This allows you to easily review or post-process the extracted content for documentation, web publishing, or downstream NLP tasks.

Step 19 (Example 2): Run Dolphin OCR on Another Input Image (xyz.png)

Now that your Dolphin model is set up, let’s run inference on a different image (ayush.png):

Run the Inference Command:

python demo_page_hf.py --model_path ./hf_model --input_path ./demo/page_imgs/ayush.png --save_dir ./results

What Happens:

The model processes ayush.png from the page_imgs/ directory.
Output is saved as a structured Markdown file:
➤ results/ayush.md

Output Example (as shown in VS Code):

The file ayush.md includes structured text, mathematical notations (e.g. \frac{}), and layout-preserved data.
Useful for scientific papers, academic documents, and mathematical content.

You can preview or edit the file using:

results/ayush.md

With this, you’ve successfully OCR-processed and structured a second document!

Conclusion

ByteDance’s Dolphin makes document understanding feel effortless — whether it’s a clean PDF, a dense research paper, or a messy scanned page full of equations and tables. With its smart two-step approach and support for Hugging Face integration, it’s built for developers and researchers who want powerful results without heavyweight overhead. And when paired with a GPU VM from NodeShift, the whole setup becomes fast, scalable, and production-ready.

Whether you’re building an academic pipeline, archiving historical records, or automating document workflows — Dolphin gives you the precision and performance to get it done right.

Now go ahead, throw your toughest documents at it. Dolphin’s ready.

Relevant blog posts

June 27, 2025

How to Install FLUX.1-Kontext-Dev Locally?

FLUX.1 Kontext [dev] is a powerful visual editing model designed to change and transform existing images based on natural instructions. Whether it’s adding new elements like a hat to a dog or adjusting the style of a scene, this model understands the context and applies the edit with impressive consistency — all without needing additional fine-tuning. Built by Black Forest Labs, FLUX.1 Kontext is equipped to handle complex transformations while preserving the original image’s integrity. What makes it truly stand out is its ability to perform multiple edits in a row with minimal drift, allowing creators, designers, and developers to iterate smoothly. This release — the [dev] version — is open to the research and builder community under a non-commercial license, with high-quality weights and native support in tools like Diffusers and ComfyUI. If you’re looking to build the next wave of creative tools, this model gives you a serious head start.

June 25, 2025

LLMs Under Fire: Red Teaming with DeepTeam + Ollama

DeepTeam is a lightweight, easy-to-use red teaming framework designed to help you test the safety and security of your language model applications — locally and transparently. Whether you’re building a chatbot, a RAG pipeline, or a full-fledged AI agent, DeepTeam helps uncover hidden vulnerabilities like bias, PII leakage, or harmful prompts before your users ever see them. Built entirely open-source and backed by the powerful DeepEval engine, DeepTeam simulates real-world adversarial attacks using methods like prompt injection and jailbreaking. It then evaluates how well your model handles them using standardized risk metrics — all without needing a curated dataset. If you’re a developer, security engineer, or open-source contributor passionate about LLM safety — this is your playground. Dive in, run local tests, or even contribute your own custom vulnerabilities and attack types. Safety isn’t optional anymore — it’s a feature. And DeepTeam helps you build it in.

June 23, 2025

How to Install Nano-VLLM Locally?

Nano-vLLM is a stripped-down, no-fluff engine designed purely for blazing-fast offline inference with large language models. It’s lightweight (just ~1,200 lines of code) but packs a serious punch — featuring smart optimizations like prefix caching, tensor parallelism, CUDA graphs, and more. Whether you’re testing models locally or building a custom inference stack, Nano-vLLM gives you raw speed, full transparency, and zero dependency bloat. It mirrors the vLLM API for easy migration, while staying small enough to dive into and hack on. If you’re running models like Qwen3-0.6B on your own GPU or a cloud VM — this is your toolkit.

See all posts

Ready to build
with us?

The ideal way for organizations young and old to ease their way into the distributed and affordable cloud at their own pace.

Stay Tuned!

Stay up to date with the latest updates, news, and hotfixes for our product.

NodeShift creates a vital link between developers and affordable cloud.

Switch theme

English (EN)
Arabic (AR)
Chinese (ZH-CN)
German (DE)
Korean (KO)
Russian (RU)
French (FR)
Spanish (ES)
Portuguese (PT)
Japanese (JA)

JavaScript is disabled in your browser. For a better experience, please enable JavaScript.Learn how to enable JavaScript.