Tutorials

How to Run Kokoro TTS on a Basic Computer – No GPU Needed!

0
Please log in or register to do it.

Hey there! Have you ever dreamed of creating natural-sounding voices for your projects without relying on big-name APIs like Google or ElevenLabs? Say hello to Kokoro TTS, a lightweight yet powerful text-to-speech (TTS) solution that runs locally on your computer—even without a fancy GPU! Let’s dive in and see what makes Kokoro TTS a fantastic choice for developers and tech enthusiasts alike.

What is Kokoro TTS?

Kokoro TTS is a small but mighty TTS model designed to turn written words into realistic audio. It’s available on platforms like Hugging Face and GitHub. Don’t let its size fool you—this model has just 82 million parameters, yet it competes with (and often beats) much bigger models. Plus, you can run it offline, which means no subscription fees, no internet dependency, and total privacy for your projects.

About Kokoro TTS

  • Small but mighty: With just 82 million parameters, it outperforms many larger models.
  • Open license: The model is Apache 2.0 licensed, meaning you can use it freely in most projects.
  • Built for everyone: You can run it on basic computers—no GPU needed!

Uses for Kokoro TTS

So, what can you do with Kokoro TTS? Here are just a few ideas:

  • Build local voice assistants that don’t send your data to the cloud.
  • Add narration to your YouTube videos or podcasts.
  • Create custom voices for your games or storytelling projects.
  • Pair it with speech recognition tools to make offline conversational agents.

Why Choose Kokoro TTS?

Kokoro TTS is perfect if you’re looking for a budget-friendly, private, and flexible TTS solution. It’s lightweight, easy to use, and doesn’t compromise on quality. Plus, it’s constantly improving. The creators recently released version 0.19, and new voicepacks are being added all the time.

Why Kokoro TTS is Worth Your Time

1. Voices in Multiple Languages

Whether you need American or British English, French, Japanese, Korean, or Chinese, Kokoro TTS has you covered. You can pick from a variety of voices or even combine them to create a custom sound that’s just right for you.

2. Make Your Own Unique Voice

With Kokoro, you’re not stuck with the default voices. You can mix and match voice “embeddings” (think of them as building blocks for voices) to create personalized voices. Whether you want a calm narrator or an energetic presenter, you’re in control.

3. Open Source and Developer-Friendly

Kokoro TTS is open source, meaning it’s free to use and has a growing community of developers. Tools like Kokoro Onnx and Kokoro FastAPI TTS make it easier to integrate Kokoro into your apps or replace external APIs.

4. Easy Setup

Setting up Kokoro TTS is straightforward. Whether you’re using Google Colab, a local computer, or a virtual environment, there are guides to walk you through every step. No headaches, no frustration.

How to Run Kokoro TTS on a Basic Computer

  1. Go the website https://colab.research.google.com
  2. File===> New Notebook in Drive. Visit colab.research.google.com

Add the code

import os

# --- Setup ---
if not os.path.exists("Kokoro-82M"):
    !git lfs install
    !git clone https://huggingface.co/hexgrad/Kokoro-82M
%cd Kokoro-82M
!apt-get -qq -y install espeak-ng > /dev/null 2>&1
!pip install -q phonemizer torch transformers scipy munch
!pip install -q gradio  # Install gradio first

import gradio as gr
import torch
from models import build_model
from kokoro import generate
from IPython.display import Audio, display
import numpy as np
device = 'cuda' if torch.cuda.is_available() else 'cpu'
print(f'loaded device: {device}')
MODEL = build_model('kokoro-v0_19.pth', device)

VOICE_NAMES = [
    'af',  # Default voice is a 50-50 mix of Bella & Sarah
    'af_bella', 'af_sarah', 'am_adam', 'am_michael',
    'bf_emma', 'bf_isabella', 'bm_george', 'bm_lewis',
    'af_nicole', 'af_sky',
]
VOICEPACKS = {name: torch.load(f'voices/{name}.pt', weights_only=True).to(device) for name in VOICE_NAMES}

def generate_audio(text, voice_name):
    if not text:
        return None, None, "Please enter some text."
    if voice_name not in VOICEPACKS:
        return None, None, "Invalid voice selected."
    voicepack = VOICEPACKS[voice_name]
    try:
        audio, out_ps = generate(MODEL, text, voicepack, lang=voice_name[0])
        return (24000, audio), out_ps, None
    except Exception as e:
      return None, None, str(e)

def display_audio(audio_tuple):
  if audio_tuple:
    rate, data = audio_tuple
    return (rate, np.array(data))
    #return display(Audio(data=data, autoplay=True))
    #return Audio(data=data, rate=rate, autoplay=True)、

  else:
    return None
# --- Gradio Interface ---
with gr.Blocks() as demo:
  gr.Markdown("## Kokoro Text-to-Speech")
  with gr.Row():
    with gr.Column():
      text_input = gr.Textbox(label="Enter text to synthesize:", lines=5, placeholder="Enter text here...")
      voice_dropdown = gr.Dropdown(choices=VOICE_NAMES, label="Select Voice", value=VOICE_NAMES[0])
      generate_button = gr.Button("Generate Audio")
      error_output = gr.Textbox(label="Error Message", interactive=False)
    with gr.Column():
      audio_output = gr.Audio(label="Generated Audio", interactive=False)
      phoneme_output = gr.Textbox(label="Phonemes", interactive=False)

  generate_button.click(
      generate_audio,
      inputs=[text_input, voice_dropdown],
      outputs=[audio_output, phoneme_output, error_output],
  ).then(
      display_audio,
      inputs=audio_output,
      outputs=audio_output
  )

# --- Launch the Web UI ---
if __name__ == "__main__":
  demo.launch(share=True)

And Run the Cell the link will be Generated Click on that you will be Able to Generate Audio using Kokoro TTS Model

Online Links

Nvidia Cosmos Model in ComfyUI Workflow

Your email address will not be published. Required fields are marked *

Only Gmail, Yahoo, and Hotmail/Outlook email addresses are allowed to comment.