In the world of search engines and information retrieval, precision matters. That’s where zerank-1-small comes in — a compact yet powerful reranker model developed by ZeroEntropy. Designed to boost the accuracy of search results, this 1.7B parameter model is a lighter sibling of the flagship zerank-1, delivering impressive performance while being over two times smaller.
What sets zerank-1-small apart is its ability to consistently outperform many well-known rerankers and deliver significant accuracy improvements over traditional vector search methods. Whether applied to fields like finance, legal, STEM, code, or medical queries, the model enhances the ranking of retrieved documents to ensure users get the most relevant answers.
Released under the open-source Apache 2.0 license, zerank-1-small is part of ZeroEntropy’s commitment to advancing open-source tools and empowering developers, researchers, and organizations to build better retrieval systems without proprietary restrictions.
Evaluations
NDCG@10 scores between zerank-1-small
and competing closed-source proprietary rerankers. Since we are evaluating rerankers, OpenAI’s text-embedding-3-small
is used as an initial retriever for the Top 100 candidate documents.
Task | Embedding | cohere-rerank-v3.5 | Salesforce/Llama-rank-v1 | zerank-1-small | zerank-1 |
---|
Code | 0.678 | 0.724 | 0.694 | 0.730 | 0.754 |
Conversational | 0.250 | 0.571 | 0.484 | 0.556 | 0.596 |
Finance | 0.839 | 0.824 | 0.828 | 0.861 | 0.894 |
Legal | 0.703 | 0.804 | 0.767 | 0.817 | 0.821 |
Medical | 0.619 | 0.750 | 0.719 | 0.773 | 0.796 |
STEM | 0.401 | 0.510 | 0.595 | 0.680 | 0.694 |
Recommended GPU VM configuration
Component | Minimum | Recommended |
---|
GPU | 1x NVIDIA A10 / A100 / RTX A6000 (24–40GB VRAM) | 1x NVIDIA A100 (40–80GB VRAM) |
CPU | 4–8 vCPU | 8–16 vCPU |
RAM | 16–32 GB | 32–64 GB |
Disk | 50+ GB SSD | 100+ GB SSD |
OS | Ubuntu 20.04 / 22.04 | Ubuntu 22.04 |
CUDA | 11.8 or higher | 12.1 or higher |
Resources
Link: https://huggingface.co/zeroentropy/zerank-1-small
Step-by-Step Process to Install & Run ZeroEntropy Zerank 1 Small Locally
For the purpose of this tutorial, we will use a GPU-powered Virtual Machine offered by NodeShift; however, you can replicate the same steps with any other cloud provider of your choice. NodeShift provides the most affordable Virtual Machines at a scale that meets GDPR, SOC2, and ISO27001 requirements.
Step 1: Sign Up and Set Up a NodeShift Cloud Account
Visit the NodeShift Platform and create an account. Once you’ve signed up, log into your account.
Follow the account setup process and provide the necessary details and information.
Step 2: Create a GPU Node (Virtual Machine)
GPU Nodes are NodeShift’s GPU Virtual Machines, on-demand resources equipped with diverse GPUs ranging from H100s to A100s. These GPU-powered VMs provide enhanced environmental control, allowing configuration adjustments for GPUs, CPUs, RAM, and Storage based on specific requirements.
Navigate to the menu on the left side. Select the GPU Nodes option, create a GPU Node in the Dashboard, click the Create GPU Node button, and create your first Virtual Machine deploy
Step 3: Select a Model, Region, and Storage
In the “GPU Nodes” tab, select a GPU Model and Storage according to your needs and the geographical region where you want to launch your model.
We will use 1 x RTXA6000 GPU for this tutorial to achieve the fastest performance. However, you can choose a more affordable GPU with less VRAM if that better suits your requirements.
Step 4: Select Authentication Method
There are two authentication methods available: Password and SSH Key. SSH keys are a more secure option. To create them, please refer to our official documentation.
Step 5: Choose an Image
In our previous blogs, we used pre-built images from the Templates tab when creating a Virtual Machine. However, for running Zerank 1 Small, we need a more customized environment with full CUDA development capabilities. That’s why, in this case, we switched to the Custom Image tab and selected a specific Docker image that meets all runtime and compatibility requirements.
We chose the following image:
nvidia/cuda:12.1.1-devel-ubuntu22.04
This image is essential because it includes:
- Full CUDA toolkit (including
nvcc
)
- Proper support for building and running GPU-based applications like Zerank 1 Small
- Compatibility with CUDA 12.1.1 required by certain model operations
Launch Mode
We selected:
Interactive shell server
This gives us SSH access and full control over terminal operations — perfect for installing dependencies, running benchmarks, and launching tools like Zerank 1 Small.
Docker Repository Authentication
We left all fields empty here.
Since the Docker image is publicly available on Docker Hub, no login credentials are required.
Identification
nvidia/cuda:12.1.1-devel-ubuntu22.04
CUDA and cuDNN images from gitlab.com/nvidia/cuda. Devel version contains full cuda toolkit with nvcc.
This setup ensures that the Zerank 1 Small runs in a GPU-enabled environment with proper CUDA access and high compute performance.
After choosing the image, click the ‘Create’ button, and your Virtual Machine will be deployed.
Step 6: Virtual Machine Successfully Deployed
You will get visual confirmation that your node is up and running.
Step 7: Connect to GPUs using SSH
NodeShift GPUs can be connected to and controlled through a terminal using the SSH key provided during GPU creation.
Once your GPU Node deployment is successfully created and has reached the ‘RUNNING’ status, you can navigate to the page of your GPU Deployment Instance. Then, click the ‘Connect’ button in the top right corner.
Now open your terminal and paste the proxy SSH IP or direct SSH IP.
Next, If you want to check the GPU details, run the command below:
nvidia-smi
Step 8: Check the Available Python version and Install the new version
Run the following commands to check the available Python version.
If you check the version of the python, system has Python 3.8.1 available by default. To install a higher version of Python, you’ll need to use the deadsnakes
PPA.
Run the following commands to add the deadsnakes
PPA:
sudo apt update
sudo apt install -y software-properties-common
sudo add-apt-repository -y ppa:deadsnakes/ppa
sudo apt update
Step 9: Install Python 3.11
Now, run the following command to install Python 3.11 or another desired version:
sudo apt install -y python3.11 python3.11-venv python3.11-dev
Step 10: Update the Default Python3
Version
Now, run the following command to link the new Python version as the default python3
:
sudo update-alternatives --install /usr/bin/python3 python3 /usr/bin/python3.8 1
sudo update-alternatives --install /usr/bin/python3 python3 /usr/bin/python3.11 2
sudo update-alternatives --config python3
Then, run the following command to verify that the new Python version is active:
python3 --version
Step 11: Install and Update Pip
Run the following command to install and update the pip:
curl -O https://bootstrap.pypa.io/get-pip.py
python3.11 get-pip.py
Then, run the following command to check the version of pip:
pip --version
Step 12: Set up Python environment
Run the following command to setup the Python environment:
python3 -m venv zerank-env
source zerank-env/bin/activate
Step 13: Install PyTorch with GPU support
Run the following command to install torch:
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
Step 14: Install sentence-transformers and accelerate
Run the following commands to install sentence-transformers and accelerate:
pip install sentence-transformers
pip install accelerate
Step 15: Connect to your GPU VM using Remote SSH
- Open VS Code on your Mac.
- Press
Cmd + Shift + P
, then choose Remote-SSH: Connect to Host
.
- Select your configured host.
- Once connected, you’ll see
SSH: 209.137.198.14
(Your VM IP) in the bottom-left status bar (like in the image).
Step 16: Run zerank-1-small model
Create a Python script (e.g., run_zerank.py
) and add the following code:
from sentence_transformers import CrossEncoder
# Load the model
model = CrossEncoder("zeroentropy/zerank-1-small", trust_remote_code=True)
# Example query-doc pairs
query_documents = [
("What is 2+2?", "4"),
("What is 2+2?", "The answer is definitely 1 million"),
]
# Get scores
scores = model.predict(query_documents)
print("Rerank scores:", scores)
run_zerank.py
This script is a simple Python file that loads the zeroentropy/zerank-1-small
reranker model using the sentence-transformers
library and runs it on predefined query-document pairs. We used it to directly test the model inference on the GPU VM, without any UI or API, to ensure the model downloads, initializes, and produces relevance scores between queries and documents. It outputs the scores to the terminal, letting us check if relevant document pairs (like “Who wrote Hamlet?” and “Shakespeare wrote Hamlet.”) get high scores, while irrelevant ones get low scores. It’s the foundation script to make sure the model itself works.
Step 17: Run the script
python3 run_zerank.py
You will see:
Model downloads:
model.safetensors: 100%|█████████████████| 3.44G/3.44G [...]
tokenizer.json: 100%|████████████████████| 11.4M/11.4M [...]
...
And final rerank scores:
Rerank scores: [0.6470849025528733, 0.28521265886319286]
In same file paste the following code:
from sentence_transformers import CrossEncoder
model = CrossEncoder("zeroentropy/zerank-1-small", trust_remote_code=True)
# Replace here with the test prompts
query_documents = [
("Who wrote Hamlet?", "Shakespeare wrote Hamlet."),
("Who wrote Hamlet?", "Einstein was the author of Hamlet."),
]
scores = model.predict(query_documents)
print("Rerank scores:", scores)
Then, again run the script:
python3 run_zerank.py
You’re testing two pairs:
"Who wrote Hamlet?"
→ "Shakespeare wrote Hamlet."
→ high score (~0.85)
"Who wrote Hamlet?"
→ "Einstein was the author of Hamlet."
→ lower score (~0.56)
The model is ranking their relevance.
Step 18: CLI Interactive Script (type in terminal)
Create a Python script (e.g., cli_rerank.py
) and add the following code:
from sentence_transformers import CrossEncoder
model = CrossEncoder("zeroentropy/zerank-1-small", trust_remote_code=True)
print("💬 ZeroEntropy Zerank-1-Small CLI Reranker")
print("Type 'exit' anytime to quit.\n")
while True:
query = input("Enter query: ")
if query.lower() == "exit":
break
doc = input("Enter document: ")
if doc.lower() == "exit":
break
score = model.predict([(query, doc)])[0]
print(f"🔹 Relevance Score: {score:.4f}\n")
cli_rerank.py
This is an interactive command-line interface (CLI) script where you can type in query and document pairs live in the terminal. We used it to manually explore different inputs and see their rerank scores on the fly, without modifying code or restarting scripts. This was useful for quick, hands-on testing and exploration, like an interactive playground in the terminal. It keeps running in a loop, letting you test as many pairs as you want, and exits gracefully when you type exit
.
Step 19: Run the script
python3 cli_rerank.py
Use the CLI interactively
Enter query:
type:
Who discovered gravity?
when prompted:
Enter document:
type:
Isaac Newton discovered gravity after observing a falling apple.
You will see:
🔹 Relevance Score: 0.8991
Step 20: Install Pandas
Run the following command to install pandas:
pip install pandas
Step 21: Batch Reranking from CSV
Create a Python script (e.g., batch_rerank.py
) and add the following code:
import pandas as pd
from sentence_transformers import CrossEncoder
model = CrossEncoder("zeroentropy/zerank-1-small", trust_remote_code=True)
# Example CSV: input.csv with columns: query, document
df = pd.read_csv("input.csv")
pairs = list(zip(df['query'], df['document']))
scores = model.predict(pairs)
df['score'] = scores
df.to_csv("output_with_scores.csv", index=False)
print("✅ Saved reranked results to 'output_with_scores.csv'")
Prepare your input CSV file
Create a file named:
input.csv
Paste this sample content:
query,document
Who wrote Hamlet?,Shakespeare wrote Hamlet.
Who discovered gravity?,Isaac Newton discovered gravity after observing a falling apple.
What is the capital of France?,Paris is the capital of France.
batch_rerank.py
This batch script reads multiple query-document pairs from an input CSV file (input.csv
), runs them all through the reranker model, and saves the results with relevance scores into a new CSV file (output_with_scores.csv
). We used this script to automate reranking over larger datasets or lists of pairs, making it ideal when you have dozens or hundreds of examples to process at once. It’s practical for offline experiments, dataset evaluations, or preparing reranked outputs for further analysis or reporting.
Step 22: Run the batch script
python3 batch_rerank.py
You should see:
✅ Saved reranked results to 'output_with_scores.csv'
Check the output file
You will see something like:
query,document,score
Who wrote Hamlet?,Shakespeare wrote Hamlet.,0.8487
Who discovered gravity?,Isaac Newton discovered gravity after observing a falling apple.,0.9168
What is the capital of France?,Paris is the capital of France.,0.8921
Step 23: Install Gradio
Run the following command to install gradio:
pip install gradio
Step 24: Gradio UI (browser interface)
Create a Python script (e.g., gradio_rerank.py
) and add the following code:
import gradio as gr
from sentence_transformers import CrossEncoder
model = CrossEncoder("zeroentropy/zerank-1-small", trust_remote_code=True)
def rerank(query, document):
score = model.predict([(query, document)])[0]
return f"Relevance Score: {score:.4f}"
iface = gr.Interface(
fn=rerank,
inputs=["text", "text"],
outputs="text",
title="ZeroEntropy Zerank-1-Small Reranker",
description="Enter a query and a document to get their relevance score."
)
iface.launch()
gradio_rerank.py
This script runs a Gradio-based web interface on the VM, providing a browser-based GUI (Graphical User Interface) where you can enter queries and documents, click submit, and instantly see the relevance score. We used this to make the reranker more user-friendly and accessible, especially for non-technical users or demos, where people can interact via the browser without writing any code. With SSH port forwarding, we could even access it securely on a local machine from the VM.
Step 25: Run the script
Run your Gradio script:
python3 gradio_rerank.py
You will see:
* Running on local URL: http://127.0.0.1:7860
* To create a public link, set `share=True` in `launch()`.
Set up SSH port forwarding from your local machine
On your local machine (Mac/Windows/Linux), open a terminal and run:
ssh -L 7860:localhost:7860 -p 32153 root@209.137.198.14
This forwards:
- Local
localhost:7860
→ Remote VM 127.0.0.1:7860
Step 26: Open the Gradio Web Interface
After you’ve forwarded the port and launched the script, open your browser and go to:
http://localhost:7860
You should see the Gradio web UI titled:
ZeroEntropy Zerank-1-Small Reranker
This is your interactive playground to chat with the Zerank-1-Small model.
Step 27: Enter test data and check output
Example:
- Query:
Who discovered gravity?
- Document:
Isaac Newton discovered gravity after observing a falling apple.
Click Submit →
You should see:
Relevance Score: 0.9139
Step 28: Install FastAPI
Run the following command to install FastAPI:
pip install fastapi uvicorn
Step 29: FastAPI Service (REST API)
Create a Python script (e.g., fast_rerank.py
) and add the following code:
from fastapi import FastAPI
from pydantic import BaseModel
from sentence_transformers import CrossEncoder
import uvicorn
app = FastAPI()
model = CrossEncoder("zeroentropy/zerank-1-small", trust_remote_code=True)
class RerankRequest(BaseModel):
query: str
document: str
@app.post("/rerank")
def rerank(request: RerankRequest):
score = model.predict([(request.query, request.document)])[0]
return {"score": score}
if __name__ == "__main__":
uvicorn.run(app, host="0.0.0.0", port=8000)
fastapi_rerank.py
This script sets up a FastAPI-based REST API that exposes the reranker as an HTTP service with a /rerank
endpoint. We used this script to turn the reranker into an API service that can be called programmatically by other applications, scripts, or tools. It’s the ideal setup for integrating the reranker into pipelines, web services, or larger systems, and it comes with automatic Swagger documentation at /docs
for easy testing and exploration.
Step 30: Run FastAPI server on the VM
On your VM, run the following command:
python3 fastapi_rerank.py
You should see:
INFO: Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit)
This means your API server is live inside the VM on port 8000.
Set up SSH port forwarding from your local machine
On your local machine (Mac/Windows/Linux), open a terminal and run:
ssh -L 8000:localhost:8000 -p 32153 root@209.137.198.14
This command:
- Forwards your local port 8000 → VM port 8000
- Allows you to access the FastAPI server running in the VM from your local machine
Access the FastAPI API from your local machine
Open browser or use curl:
http://localhost:8000/docs
This opens the FastAPI Swagger UI, where you can:
- Test
/rerank
endpoint
- Send POST requests
- See live JSON responses
Test using Python or curl
Example curl:
curl -X POST "http://localhost:8000/rerank" \
-H "Content-Type: application/json" \
-d '{"query": "Who discovered gravity?", "document": "Isaac Newton discovered gravity after observing a falling apple."}'
Each script progressed you from basic testing → interactive use → batch processing → GUI → API — giving you a complete, flexible stack to use the reranker however you need.
Conclusion
In this guide, we walked through the complete process of installing, configuring, and running the ZeroEntropy Zerank-1-Small reranker model on a GPU virtual machine, using a variety of interfaces — from simple Python scripts and command-line tools to browser-based Gradio apps and full-fledged FastAPI services. Each script served a clear purpose: whether it was for quick testing, hands-on exploration, batch reranking, or integrating the model into larger systems, we covered it all.
By the end, you don’t just have a model running — you have a practical, flexible reranking toolkit that can be adapted for developers, researchers, and even non-technical users. With open-source access and the freedom to scale across industries like finance, legal, STEM, and medical, Zerank-1-Small puts high-quality, transparent reranking power right at your fingertips — no black boxes, no vendor lock-in, just straightforward, efficient search improvement.