Meet OmniVoice: The AI That Speaks Your Language
Imagine creating natural, human-like voiceovers in over 600 languages, without recording a single syllable.
That’s OmniVoice, a next-generation zero-shot text-to-speech model built on a cutting-edge diffusion architecture.
Whether you’re localizing content for global audiences, prototyping voice assistants, or designing custom voices for accessibility tools, OmniVoice delivers studio-quality speech at remarkable speed. With built-in voice cloning and intuitive voice design, it empowers creators, developers, and businesses to break language barriers faster, cheaper, and more inclusively. No massive datasets. No complex pipelines.
Just type, choose a voice, and listen, any language, any accent, any time. Ready to make your content heard everywhere?
Features
- 600+ Languages Supported: The broadest language coverage among zero-shot TTS models (full list).
- Voice Cloning: State-of-the-art voice cloning quality.
- Voice Design: Control voices via assigned speaker attributes (gender, age, pitch, dialect/accent, whisper, etc.).
- Fine-grained Control: Non-verbal symbols (e.g.,
[laughter]) and pronunciation correction via pinyin or phonemes. - Fast Inference: RTF as low as 0.025 (40x faster than real-time).
- Diffusion Language Model-style Architecture: A clean, streamlined, and scalable design that delivers both quality and speed.
- Supports NVIDIA GPU, and Apple Silicon
Installation and requirements
To Install OmniVoice, you will need to install PyTorch then you can install either from source or clone the repo:
# From PyPI (stable release)
pip install omnivoice
# From the latest source on GitHub (no need to clone)
pip install git+https://github.com/k2-fsa/OmniVoice.git
# For development (clone first, editable install)
git clone https://github.com/k2-fsa/OmniVoice.git
cd OmniVoice
pip install -e .
License
- Apache 2.0 License.




