
1. Overview
coqui-ai-TTS is a continuation of the original coqui-tts project, which is no longer maintained. Thanks to Idiap, we can still use the features of coqui-tts through this branch.
Key Features:
- Pretrained models in 1100+ languages
- Tools for training new models and fine-tuning existing ones
- Utilities for dataset analysis and curation
- Support for voice conversion (with OpenVoice integration)
It works across Windows, macOS, and Linux through Conda.
2. Installation
Step 1: Create a Conda Environment
(Install Conda/Miniconda first: Anaconda Docs)
conda create -n coqui-ai-TTS python=3.10
conda activate coqui-ai-TTS
Step 2: Install coqui-ai-TTS
- If you only want to use pretrained models:
pip install coqui-tts
- If you want to train or modify models:
git clone https://github.com/idiap/coqui-ai-TTS
cd coqui-ai-TTS
pip install -e .
3. Managing Models
List available models:
tts --list_models
Default download location:
/root/.local/share/tts
Change model storage location:
Temporary environment variables:
export XDG_DATA_HOME="/www/coqui/models"
export TTS_HOME="/www/coqui/models"
echo $XDG_DATA_HOME
Permanent setup (Linux/macOS):
- Edit
~/.bashrc
(for bash) or~/.zshrc
(for zsh/macOS). - Add environment variables.
- Reload config:
source ~/.bashrc
or. ~/.zshrc
4. Voice Cloning
Supported Models
- YourTTS (and other d-vector models)
- XTTS
- Tortoise
- Bark
Two Modes: Python API & Command-line
Python API Example:
import torch
from TTS.api import TTS
device = "cuda" if torch.cuda.is_available() else "cpu"
api = TTS("tts_models/multilingual/multi-dataset/xtts_v2").to(device)
# Step 1: Clone voice
api.tts_to_file(
text="Hello world",
speaker_wav=["my/cloning/audio.wav", "my/cloning/audio2.wav"],
speaker="MySpeaker1",
language="en",
)
# Step 2: Reuse cloned voice
api.tts_to_file(
text="Hello world",
speaker="MySpeaker1",
language="en",
)
CLI Example:
# Step 1: Clone voice
tts --model_name "tts_models/multilingual/multi-dataset/xtts_v2" \
--text "你好世界" \
--language_idx "zh" \
--speaker_wav "my/cloning/audio.wav" "my/cloning/audio2.wav" \
--speaker_idx "MySpeaker1"
# Step 2: Reuse cloned voice
tts --model_name "tts_models/multilingual/multi-dataset/xtts_v2" \
--text "你好世界" \
--language_idx "zh" \
--speaker_idx "MySpeaker1"
⚠️ For Chinese voice cloning, install:
pip install pypinyin
Otherwise, you’ll see:
ImportError: Chinese requires: pypinyin
5. Voice Conversion
Convert a source voice into the style of a target voice.
Python API Example:
from TTS.api import TTS
tts = TTS("voice_conversion_models/multilingual/vctk/freevc24").to("cuda")
tts.voice_conversion_to_file(
source_wav="my/source.wav",
target_wav="my/target.wav",
file_path="output.wav"
)
CLI Example:
tts --model_name "voice_conversion_models/multilingual/multi-dataset/openvoice_v2" \
--source_wav "source.wav" \
--target_wav "target1.wav" "target2.wav" \
--out_path "output.wav"