Kokoro is a text to speech model released back in 2025. You can run it locally at no cost, or use paid APIs that charge about $0.6 to $1 per million characters as of March 2026.

It’s a small model with only 82 million parameters. But it still gives speech that sounds close to much bigger systems. So it runs fast and doesn’t need much power.

The model was made by Hexgrad, with training work linked to a developer called rzvzn. It builds on earlier work like StyleTTS2 instead of starting from scratch.

Its main idea is simple – keep things efficient. The small size lets it run on local machines and even in browsers, where bigger models would struggle. It uses a decoder-only setup and ISTFTNet to make audio, which helps keep it quick and light.

For training, it sticks to safe data sources. That includes synthetic audio and permissive datasets. So it avoids copyright issues, which makes it easier to use in real products.

Kokoro isn’t the top model in raw audio quality. But it gives a strong balance between size and performance. That makes it useful for apps that need low cost or local use.

It comes with a range of features. These include text to speech in many languages, over 50 voice options, speed control, and phoneme handling through a G2P tool called misaki. It also supports streaming and outputs formats like WAV, MP3 and FLAC. The audio is usually 24kHz.

There are also ONNX versions of Kokoro now. These take the same model and convert it into a format that runs easier across systems. Some builds also shrink it further using 8-bit or 4-bit setups.

This means it can run on CPUs, in browsers using WebGPU or WASM, and on smaller devices. You don’t need full PyTorch anymore. There is a small drop in audio quality depending on how much it’s compressed. But in return you get faster speed, lower delay, and more privacy since it can run on your own device.

Kokoro TTS audio model

Key Features

Supported Languages

Model Performance Editor’s Rating

User Ratings

Kokoro TTS Examples

Where To Find Kokoro TTS

Related Audio Models