Hey there! Have you ever dreamed of creating natural-sounding voices for your projects without relying on big-name APIs like Google or ElevenLabs? Say hello to Kokoro TTS, a lightweight yet powerful text-to-speech (TTS) solution that runs locally on your computer—even without a fancy GPU! Let’s dive in and see what makes Kokoro TTS a fantastic choice for developers and tech enthusiasts alike.
What is Kokoro TTS?
Kokoro TTS is a small but mighty TTS model designed to turn written words into realistic audio. It’s available on platforms like Hugging Face and GitHub. Don’t let its size fool you—this model has just 82 million parameters, yet it competes with (and often beats) much bigger models. Plus, you can run it offline, which means no subscription fees, no internet dependency, and total privacy for your projects.
About Kokoro TTS
- Small but mighty: With just 82 million parameters, it outperforms many larger models.
- Open license: The model is Apache 2.0 licensed, meaning you can use it freely in most projects.
- Built for everyone: You can run it on basic computers—no GPU needed!
Uses for Kokoro TTS
So, what can you do with Kokoro TTS? Here are just a few ideas:
- Build local voice assistants that don’t send your data to the cloud.
- Add narration to your YouTube videos or podcasts.
- Create custom voices for your games or storytelling projects.
- Pair it with speech recognition tools to make offline conversational agents.
Why Choose Kokoro TTS?
Kokoro TTS is perfect if you’re looking for a budget-friendly, private, and flexible TTS solution. It’s lightweight, easy to use, and doesn’t compromise on quality. Plus, it’s constantly improving. The creators recently released version 0.19, and new voicepacks are being added all the time.
Why Kokoro TTS is Worth Your Time
1. Voices in Multiple Languages
Whether you need American or British English, French, Japanese, Korean, or Chinese, Kokoro TTS has you covered. You can pick from a variety of voices or even combine them to create a custom sound that’s just right for you.
2. Make Your Own Unique Voice
With Kokoro, you’re not stuck with the default voices. You can mix and match voice “embeddings” (think of them as building blocks for voices) to create personalized voices. Whether you want a calm narrator or an energetic presenter, you’re in control.
3. Open Source and Developer-Friendly
Kokoro TTS is open source, meaning it’s free to use and has a growing community of developers. Tools like Kokoro Onnx and Kokoro FastAPI TTS make it easier to integrate Kokoro into your apps or replace external APIs.
4. Easy Setup
Setting up Kokoro TTS is straightforward. Whether you’re using Google Colab, a local computer, or a virtual environment, there are guides to walk you through every step. No headaches, no frustration.
How to Run Kokoro TTS on a Basic Computer
- Go the website https://colab.research.google.com
- File===> New Notebook in Drive. Visit colab.research.google.com
Add the code
import os
# --- Setup ---
if not os.path.exists("Kokoro-82M"):
!git lfs install
!git clone https://huggingface.co/hexgrad/Kokoro-82M
%cd Kokoro-82M
!apt-get -qq -y install espeak-ng > /dev/null 2>&1
!pip install -q phonemizer torch transformers scipy munch
!pip install -q gradio # Install gradio first
import gradio as gr
import torch
from models import build_model
from kokoro import generate
from IPython.display import Audio, display
import numpy as np
device = 'cuda' if torch.cuda.is_available() else 'cpu'
print(f'loaded device: {device}')
MODEL = build_model('kokoro-v0_19.pth', device)
VOICE_NAMES = [
'af', # Default voice is a 50-50 mix of Bella & Sarah
'af_bella', 'af_sarah', 'am_adam', 'am_michael',
'bf_emma', 'bf_isabella', 'bm_george', 'bm_lewis',
'af_nicole', 'af_sky',
]
VOICEPACKS = {name: torch.load(f'voices/{name}.pt', weights_only=True).to(device) for name in VOICE_NAMES}
def generate_audio(text, voice_name):
if not text:
return None, None, "Please enter some text."
if voice_name not in VOICEPACKS:
return None, None, "Invalid voice selected."
voicepack = VOICEPACKS[voice_name]
try:
audio, out_ps = generate(MODEL, text, voicepack, lang=voice_name[0])
return (24000, audio), out_ps, None
except Exception as e:
return None, None, str(e)
def display_audio(audio_tuple):
if audio_tuple:
rate, data = audio_tuple
return (rate, np.array(data))
#return display(Audio(data=data, autoplay=True))
#return Audio(data=data, rate=rate, autoplay=True)、
else:
return None
# --- Gradio Interface ---
with gr.Blocks() as demo:
gr.Markdown("## Kokoro Text-to-Speech")
with gr.Row():
with gr.Column():
text_input = gr.Textbox(label="Enter text to synthesize:", lines=5, placeholder="Enter text here...")
voice_dropdown = gr.Dropdown(choices=VOICE_NAMES, label="Select Voice", value=VOICE_NAMES[0])
generate_button = gr.Button("Generate Audio")
error_output = gr.Textbox(label="Error Message", interactive=False)
with gr.Column():
audio_output = gr.Audio(label="Generated Audio", interactive=False)
phoneme_output = gr.Textbox(label="Phonemes", interactive=False)
generate_button.click(
generate_audio,
inputs=[text_input, voice_dropdown],
outputs=[audio_output, phoneme_output, error_output],
).then(
display_audio,
inputs=audio_output,
outputs=audio_output
)
# --- Launch the Web UI ---
if __name__ == "__main__":
demo.launch(share=True)
And Run the Cell the link will be Generated Click on that you will be Able to Generate Audio using Kokoro TTS Model