AI creators tools

Pocket TTS audio model

Name: pocket-tts
Licence: MIT License
Creator: Kyutai Labs

pocket‑tts (TTS that fits in your CPU) is an open-source text-to-speech model by Kyutai Labs released in January 2026. It’s designed to generate natural-sounding speech locally, efficiently, and without requiring a GPU — even on ordinary laptops or desktops.

  • Lightweight, CPU-focused TTS engine that runs in real time on standard hardware (e.g., laptop CPUs).
  • Voice cloning support — you can make it imitate a voice from a short sample.
  • Targeted at developers who want TTS without cloud APIs or GPU requirements.

Model & Performance

  • ~100 million parameters, making it very small for a modern speech model.
  • Low latency: ~200 ms to first audio chunk and usually faster-than-real-time on CPUs.
  • Only 2 CPU cores needed, and doesn’t require GPU-enabled PyTorch.

APIs & Interfaces

  • Command-line interface (CLI) — for quick text-to-speech generation.
  • Python library API — integrate into Python apps.
  • Serve mode — run a local HTTP service to generate speech via REST calls.

Voice & Language

  • Includes a small catalog of builtin voices.
  • Voice cloning by providing a WAV file sample for personalization.
  • Primarily English support in the core project (some tools outside can supply voices).

Usage Scenarios

  • Local TTS engines for accessibility tools, desktop assistants, embedded applications.
  • Temporary voice synthesis (e.g., reading text aloud).
  • Prototyping speech apps without cloud dependency.

Limitations

  • Current build only supports CPU (no browser or GPU builds yet).
  • Primarily English — limited other language support out of the box.
  • Does not yet support some advanced features like silence control or quantized int8 models.
Key Features
Supported Languages
  • English
Model Performance Editor’s Rating
No editor performance evaluations available for this model yet.
User Ratings
Censorship
--
Lower = less censorship. Higher = stricter filtering.
Creativity
--
Expressiveness
--
Generation Speed
--
ID preservation
--
Prompt Following
--
Realism
--
No sample outputs available for this model yet.

Where To Find Pocket TTS

If you'd like to access this model, you can explore the following possibilities:

No tools currently list this model.