A Step-by-Step Guide to Install Kokoro-82M Locally for Fast and High-Quality TTS

by Aditi Bindal | February 15, 2025

Ready to build cheaper?

Custom CPU plans from as little as $0.012/hour.

Kokoro-82M is a state-of-the-art text-to-speech (TTS) model that stands out for its lightweight yet powerful architecture. With 82 million parameters, it offers high-quality audio output comparable to larger models, all while being faster and more cost-efficient. What makes Kokoro even more appealing is its open-weight architecture, which empowers developers to deploy it in diverse environments—from production systems to experimental personal projects. This flexibility makes it a valuable tool for anyone looking to implement efficient and scalable TTS solutions.

In this guide, we’ll walk you through the step-by-step process of running the Kokoro-FastAPI package locally. This package is a Dockerized wrapper of Kokoro-82M, enabling you to leverage its capabilities for your own projects seamlessly through an intuitive interface.

Prerequisites

A virtual machine (GPU or CPU, such as the ones provided by NodeShift) with at least:
- 2 vCPUs
- 8 GB RAM
- 100 GB SSD
Ubuntu 22.04 VM
Docker installed

Note: The prerequisites for this are highly variable across use cases. A high-end configuration could be used for a large-scale deployment.

Step-by-step process to install and run Kokoro-82M Locally

For this tutorial, we’ll use a CPU-powered Virtual Machine by NodeShift, which provides high-compute Virtual Machines at a very affordable cost on a scale that meets GDPR, SOC2, and ISO27001 requirements. It also offers an intuitive and user-friendly interface, making it easier for beginners to get started with Cloud deployments. However, feel free to use any cloud provider you choose and follow the same steps for the rest of the tutorial.

Step 1: Setting up a NodeShift Account

Visit app.nodeshift.com and create an account by filling in basic details, or continue signing up with your Google/GitHub account.

If you already have an account, login straight to your dashboard.

Step 2: Create a Compute Node (CPU Virtual Machine)

After accessing your account, you should see a dashboard (see image), now:

Navigate to the menu on the left side.
Click on the Compute Nodes option.

Click on Start to start creating your very first compute node.

These Compute nodes are CPU-powered virtual machines by NodeShift. These nodes are highly customizable and let you control different environmental configurations, such as vCPUs, RAM, and storage, according to your needs.

Step 3: Select configuration for VM

The first option you see is the Reliability dropdown. This option lets you choose the uptime guarantee level you seek for your VM (e.g., 99.9%).

Next, select a geographical region from the Region dropdown where you want to launch your VM (e.g., United States).

Most importantly, select the correct specifications for your VM according to your workload requirements by sliding the bars for each option.

Step 4: Choose VM Configuration and Image

After selecting your required configuration options, you’ll see the available VMs in your region and as per (or very close to) your configuration. In our case, we’ll choose a ‘4vCPUs/8GB/160GB SSD’ Compute node.
Next, you’ll need to choose an image for your Virtual Machine. For the scope of this tutorial, we’ll select Ubuntu.

Step 5: Choose the Billing cycle and Authentication Method

Two billing cycle options are available: Hourly, ideal for short-term usage, offering pay-as-you-go flexibility, and Monthly for long-term projects with a consistent usage rate and potentially lower cost.

Next, you’ll need to select an authentication method. Two methods are available: Password and SSH Key. We recommend using SSH keys, as they are a more secure option. To create one, head over to our official documentation.

Step 6: Finalize Details and Create Deployment

Finally, you can also add a VPC (Virtual Private Cloud), which provides an isolated section to launch your cloud resources (Virtual machine, storage, etc.) in a secure, private environment. We’re keeping this option as the default for now, but feel free to create a VPC according to your needs.

Also, you can deploy multiple nodes at once using the Quantity option.

That’s it! You are now ready to deploy the node. Finalize the configuration summary; if it looks good, go ahead and click Create to deploy the node.

Step 7: Connect to active Compute Node using SSH

As soon as you create the node, it will be deployed in a few seconds or a minute. Once deployed, you will see a status Running in green, meaning that our Compute node is ready to use!

Once your node shows this status, follow the below steps to connect to the running VM via SSH:

Open your terminal and run the below SSH command:

(replace root with your username and paste the IP of your VM in place of ip after copying it from the dashboard)

ssh root@ip

2. In some cases, your terminal may take your consent before connecting. Enter ‘yes’.

3. A prompt will request a password. Type the SSH password, and you should be connected.

Output:

Step 8: Run Kokoro-82M FastAPI Package with Docker

Before running the model, ensure you have Docker installed in the system.

Start Kokoro’s FastAPI wrapper with docker run.

Run the following command if you’re using a GPU node:

docker run --gpus all -p 8880:8880 ghcr.io/remsky/kokoro-fastapi-gpu:v0.2.0post4

Run the following command if you’re using a CPU node:

docker run -p 8880:8880 ghcr.io/remsky/kokoro-fastapi-cpu:v0.2.0post4

Output:

Step 9: Access the web interface

Once the container has been started, you can access the Kokoro interface at the following URL:

(replace <YOUR_SERVER_IP> with your remote server’s IP address or localhost if you are on your local machine)

http://<YOUR_SERVER_IP>:8888/web

This is what the interface looks like:

2. Write a piece of text in the input box to convert to audio.

3. After that, select a speaker voice, for e.g. “bf_isabella” which means British Female voice named Isabella.

4. Click on Generate Speech to get an audio output on the right side.

You can play and pause the audio, generate hours of speech free of cost, and download them if you access the application through a different interface, such as Gradio.

Conclusion

By following this guide, you’ve learned how to run the Kokoro-FastAPI package locally. It provides a seamless interface to harness the power of the Kokoro-82M TTS model. Its lightweight yet robust architecture ensures high-quality audio outputs while maintaining efficiency. NodeShift’s cloud infrastructure further enhances this experience by offering a reliable and scalable environment, making it easier for developers to deploy and manage AI-driven applications effortlessly.

Relevant blog posts

July 21, 2025

How to Install & Run ZeroEntropy Zerank 1 Small Locally?

In the world of search engines and information retrieval, precision matters. That’s where zerank-1-small comes in — a compact yet powerful reranker model developed by ZeroEntropy. Designed to boost the accuracy of search results, this 1.7B parameter model is a lighter sibling of the flagship zerank-1, delivering impressive performance while being over two times smaller. What sets zerank-1-small apart is its ability to consistently outperform many well-known rerankers and deliver significant accuracy improvements over traditional vector search methods. Whether applied to fields like finance, legal, STEM, code, or medical queries, the model enhances the ranking of retrieved documents to ensure users get the most relevant answers. Released under the open-source Apache 2.0 license, zerank-1-small is part of ZeroEntropy’s commitment to advancing open-source tools and empowering developers, researchers, and organizations to build better retrieval systems without proprietary restrictions.

July 18, 2025

How to Install LiquidAI LFM2-1.2B Locally?

The LFM2-1.2B is a next-generation hybrid model developed by Liquid AI, designed specifically for edge AI and on-device deployment. With ~1.2 billion parameters, this model stands out for its speed, memory efficiency, and quality, making it ideal for lightweight applications like agentic tasks, data extraction, RAG, creative writing, and multi-turn conversations. Model details Due to their small size, we recommend fine-tuning LFM2 models on narrow use cases to maximize performance. They are particularly suited for agentic tasks, data extraction, RAG, creative writing, and multi-turn conversations. However, we do not recommend using them for tasks that are knowledge-intensive or require programming skills.

July 16, 2025

How to Install Mistral Voxtral Locally?

Both Voxtral Mini and Voxtral Small are built on top of solid text processing backbones, but they go several steps further by adding state-of-the-art audio input abilities. You can feed them audio clips of up to 30–40 minutes, and they’ll handle it with impressive detail, whether that’s simple transcription or deeper understanding tasks like Q&A or generating summaries.

See all posts

Ready to build
with us?

The ideal way for organizations young and old to ease their way into the distributed and affordable cloud at their own pace.

Stay Tuned!

Stay up to date with the latest updates, news, and hotfixes for our product.

NodeShift creates a vital link between developers and affordable cloud.

Switch theme

English (EN)
Arabic (AR)
Chinese (ZH-CN)
German (DE)
Korean (KO)
Russian (RU)
French (FR)
Spanish (ES)
Portuguese (PT)
Japanese (JA)

JavaScript is disabled in your browser. For a better experience, please enable JavaScript.Learn how to enable JavaScript.