Dolphin is a powerful tool that reads and understands document images — whether it’s a scanned PDF, a handwritten formula, or a complex layout with tables and figures. It works in two smart steps: first, it analyzes the full structure of the page (like how we read top to bottom, left to right), then it breaks down each element (like a paragraph or equation) and makes sense of it in parallel. What makes Dolphin stand out is how lightweight and fast it is, while still handling all the messy, real-world formats we throw at it — making it perfect for researchers, developers, and document-heavy workflows.
Resources
HuggingFace
Link: https://huggingface.co/ByteDance/Dolphin
GitHub
Link: https://github.com/bytedance/Dolphin
GPU Configuration Table for ByteDance Dolphin
GPU Model | vCPUs | RAM (GB) | VRAM (GB) | Precision | Recommended Use Case |
---|
RTX A6000 | 48 | 45 | 48 | FP16/BF16 | Full-speed parsing of multi-page PDFs and high-res document images |
A100 40GB | 96 | 90 | 40 | FP16/BF16 | Batch inference, element-level parsing at scale |
T4 | 16 | 16 | 16 | INT8/FP16 | Light document parsing, individual element decoding |
L4 | 24 | 24 | 24 | FP16 | Ideal for demo runs and Hugging Face model inference |
V100 32GB | 32 | 60 | 32 | FP16 | Balanced performance for PDF + image parsing |
Step-by-Step Process to Install ByteDance Dolphin Locally
For the purpose of this tutorial, we will use a GPU-powered Virtual Machine offered by NodeShift; however, you can replicate the same steps with any other cloud provider of your choice. NodeShift provides the most affordable Virtual Machines at a scale that meets GDPR, SOC2, and ISO27001 requirements.
Step 1: Sign Up and Set Up a NodeShift Cloud Account
Visit the NodeShift Platform and create an account. Once you’ve signed up, log into your account.
Follow the account setup process and provide the necessary details and information.
Step 2: Create a GPU Node (Virtual Machine)
GPU Nodes are NodeShift’s GPU Virtual Machines, on-demand resources equipped with diverse GPUs ranging from H100s to A100s. These GPU-powered VMs provide enhanced environmental control, allowing configuration adjustments for GPUs, CPUs, RAM, and Storage based on specific requirements.
Navigate to the menu on the left side. Select the GPU Nodes option, create a GPU Node in the Dashboard, click the Create GPU Node button, and create your first Virtual Machine deploy
Step 3: Select a Model, Region, and Storage
In the “GPU Nodes” tab, select a GPU Model and Storage according to your needs and the geographical region where you want to launch your model.
We will use 1 x RTXA6000 GPU for this tutorial to achieve the fastest performance. However, you can choose a more affordable GPU with less VRAM if that better suits your requirements.
Step 4: Select Authentication Method
There are two authentication methods available: Password and SSH Key. SSH keys are a more secure option. To create them, please refer to our official documentation.
Step 5: Choose an Image
Next, you will need to choose an image for your Virtual Machine. We will deploy ByteDance Dolphin on an NVIDIA Cuda Virtual Machine. This proprietary, closed-source parallel computing platform will allow you to install ByteDance Dolphin on your GPU Node.
After choosing the image, click the ‘Create’ button, and your Virtual Machine will be deployed.
Step 6: Virtual Machine Successfully Deployed
You will get visual confirmation that your node is up and running.
Step 7: Connect to GPUs using SSH
NodeShift GPUs can be connected to and controlled through a terminal using the SSH key provided during GPU creation.
Once your GPU Node deployment is successfully created and has reached the ‘RUNNING’ status, you can navigate to the page of your GPU Deployment Instance. Then, click the ‘Connect’ button in the top right corner.
Now open your terminal and paste the proxy SSH IP or direct SSH IP.
Next, If you want to check the GPU details, run the command below:
nvidia-smi
Step 8: Install Miniconda & Packages
After completing the steps above, install Miniconda.
Miniconda is a free minimal installer for conda. It allows the management and installation of Python packages.
Anaconda has over 1,500 pre-installed packages, making it a comprehensive solution for data science projects. On the other hand, Miniconda allows you to install only the packages you need, reducing unnecessary clutter in your environment.
We highly recommend installing Python using Miniconda. Miniconda comes with Python and a small number of essential packages. Additional packages can be installed using the package management systems Mamba or Conda.
For Linux/macOS:
Download the Miniconda installer script:
sudo apt update && apt install wget -y
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
For Windows:
- Download the Windows Miniconda installer from the official website.
- Run the installer and follow the installation prompts
Run the installer script:
bash Miniconda3-latest-Linux-x86_64.sh
After Installing Miniconda, you will see the following message:
Thank you for installing Miniconda 3! This means Miniconda is installed in your working directory or on your operating system.
Check the screenshot below for proof:
Step 9: Activate Conda and Create a Environment
After the installation process, activate Conda using the following command:
export PATH="/root/miniconda3/bin:$PATH"
conda init
exec "$SHELL"
Create a Conda Environment using the following command:
conda create -n bytedance python=3.11 -y
conda activate flux
conda create
: This is the command to create a new environment.
-n
bytedance: The -n
flag specifies the name of the environment you want to create. Here bytedance
is the name of the environment you’re creating. You can name it anything you like.
python=3.11
: This specifies the version of Python that you want to install in the new environment. In this case, it’s Python 3.11.
-y
: This flag automatically answers “yes” to all prompts during the creation process, so the environment is created without asking for further confirmation.
Step 10: Clone the ByteDance Dolphin Repository
Run the following commands to clone the flux repository:
git clone https://github.com/bytedance/Dolphin.git
cd dolphin
Step 11: Install the Required Python Dependencies
After cloning the repository, install all dependencies listed in the requirements.txt
file using:
pip install -r requirements.txt
This command will install essential packages such as:
numpy
opencv-python
opencv-python-headless
Pillow
timm
torch
torchvision
Step 12: Install HuggingFace Hub
Run the following command to install huggingface_hub:
pip install huggingface_hub
Step 13: Login to Hugging Face
Run the following command to use the CLI to authenticate:
huggingface-cli login
This will ask for your Hugging Face token.
You can generate your token here:
https://huggingface.co/settings/tokens
Use a read access token, copy it, and paste it in the terminal prompt.
Step 14: Download the Pretrained ByteDance Dolphin Model from Hugging Face
Run the following command to download the pretrained ByteDance Dolphin Model from Hugging Face:
huggingface-cli download ByteDance/Dolphin --local-dir ./hf_model
This will save all model files into the ./hf_model
directory, which will later be used for inference.
Step 15: Install OpenCV Dependency (libGL
)
Some Dolphin components (like image rendering with OpenCV) require libGL.so.1
, which is missing by default in many Ubuntu environments. You can fix this by installing libgl1
.
Run the following command:
sudo apt update && sudo apt install -y libgl1
This will install libGL
and all its required dependencies like:
libdrm
, libx11
, libxcb
, mesa
, and vulkan
drivers
Step 16: Run Inference with the Dolphin Model
Now that everything is set up, you can run inference on a sample document image using the Dolphin model.
Run the following command:
python demo_page_hf.py --model_path ./hf_model --input_path ./demo/page_imgs/page_1.jpeg --save_dir ./results
What this does:
- Loads the model from
./hf_model
- Takes the input image
page_1.jpeg
from ./demo/page_imgs/
- Saves the output predictions to the
./results
folder
Processing completed. Results saved to ./results
Step 17: View Output Results in Markdown Format
Once the Dolphin model has processed the input image, the results are saved as a .md
(Markdown) file inside the results/
directory.
In this case, the file page_1.md
contains the structured text output extracted from the image.
You can open and preview it using:
results/page_1.md
The output includes:
- Markdown headers (
#
, ##
) for sections like abstracts and introductions
- Formatted paragraphs
- Detected titles, authors, and document structure from the input image
This allows you to easily review or post-process the extracted content for documentation, web publishing, or downstream NLP tasks.
Step 19 (Example 2): Run Dolphin OCR on Another Input Image (xyz.png)
Now that your Dolphin model is set up, let’s run inference on a different image (ayush.png
):
Run the Inference Command:
python demo_page_hf.py --model_path ./hf_model --input_path ./demo/page_imgs/ayush.png --save_dir ./results
What Happens:
- The model processes
ayush.png
from the page_imgs/
directory.
- Output is saved as a structured Markdown file:
➤ results/ayush.md
Output Example (as shown in VS Code):
- The file
ayush.md
includes structured text, mathematical notations (e.g. \frac{}
), and layout-preserved data.
- Useful for scientific papers, academic documents, and mathematical content.
You can preview or edit the file using:
results/ayush.md
With this, you’ve successfully OCR-processed and structured a second document!
Conclusion
ByteDance’s Dolphin makes document understanding feel effortless — whether it’s a clean PDF, a dense research paper, or a messy scanned page full of equations and tables. With its smart two-step approach and support for Hugging Face integration, it’s built for developers and researchers who want powerful results without heavyweight overhead. And when paired with a GPU VM from NodeShift, the whole setup becomes fast, scalable, and production-ready.
Whether you’re building an academic pipeline, archiving historical records, or automating document workflows — Dolphin gives you the precision and performance to get it done right.
Now go ahead, throw your toughest documents at it. Dolphin’s ready.