coqui-ai-TTS Beginner’s Guide | A Deep Learning Toolkit for Text-to-Speech

1. Overview

coqui-ai-TTS is a continuation of the original coqui-tts project, which is no longer maintained. Thanks to Idiap, we can still use the features of coqui-tts through this branch.

Key Features:

Pretrained models in 1100+ languages
Tools for training new models and fine-tuning existing ones
Utilities for dataset analysis and curation
Support for voice conversion (with OpenVoice integration)

It works across Windows, macOS, and Linux through Conda.

2. Installation

Step 1: Create a Conda Environment

(Install Conda/Miniconda first: Anaconda Docs)

conda create -n coqui-ai-TTS python=3.10
conda activate coqui-ai-TTS

Step 2: Install coqui-ai-TTS

If you only want to use pretrained models:

pip install coqui-tts

If you want to train or modify models:

git clone https://github.com/idiap/coqui-ai-TTS
cd coqui-ai-TTS
pip install -e .

3. Managing Models

List available models:

tts --list_models

Default download location:

/root/.local/share/tts

Change model storage location:

Temporary environment variables:

export XDG_DATA_HOME="/www/coqui/models"
export TTS_HOME="/www/coqui/models"
echo $XDG_DATA_HOME

Permanent setup (Linux/macOS):

Edit ~/.bashrc (for bash) or ~/.zshrc (for zsh/macOS).
Add environment variables.
Reload config: source ~/.bashrc or . ~/.zshrc

4. Voice Cloning

Supported Models

YourTTS (and other d-vector models)
XTTS
Tortoise
Bark

Two Modes: Python API & Command-line

Python API Example:

import torch
from TTS.api import TTS

device = "cuda" if torch.cuda.is_available() else "cpu"
api = TTS("tts_models/multilingual/multi-dataset/xtts_v2").to(device)

# Step 1: Clone voice
api.tts_to_file(
  text="Hello world",
  speaker_wav=["my/cloning/audio.wav", "my/cloning/audio2.wav"],
  speaker="MySpeaker1",
  language="en",
)

# Step 2: Reuse cloned voice
api.tts_to_file(
  text="Hello world",
  speaker="MySpeaker1",
  language="en",
)

CLI Example:

# Step 1: Clone voice
tts --model_name "tts_models/multilingual/multi-dataset/xtts_v2" \
    --text "你好世界" \
    --language_idx "zh" \
    --speaker_wav "my/cloning/audio.wav" "my/cloning/audio2.wav" \
    --speaker_idx "MySpeaker1"

# Step 2: Reuse cloned voice
tts --model_name "tts_models/multilingual/multi-dataset/xtts_v2" \
    --text "你好世界" \
    --language_idx "zh" \
    --speaker_idx "MySpeaker1"

⚠️ For Chinese voice cloning, install:

pip install pypinyin

Otherwise, you’ll see:

ImportError: Chinese requires: pypinyin

5. Voice Conversion

Convert a source voice into the style of a target voice.

Python API Example:

from TTS.api import TTS

tts = TTS("voice_conversion_models/multilingual/vctk/freevc24").to("cuda")
tts.voice_conversion_to_file(
  source_wav="my/source.wav",
  target_wav="my/target.wav",
  file_path="output.wav"
)

CLI Example:

tts --model_name "voice_conversion_models/multilingual/multi-dataset/openvoice_v2" \
    --source_wav "source.wav" \
    --target_wav "target1.wav" "target2.wav" \
    --out_path "output.wav"

https://github.com/idiap/coqui-ai-TTS

AI Signal Light

New Articles

coqui-ai-TTS Beginner’s Guide | A Deep Learning Toolkit for Text-to-Speech

1. Overview

2. Installation

Step 1: Create a Conda Environment

Step 2: Install coqui-ai-TTS

3. Managing Models

List available models:

Default download location:

Change model storage location:

4. Voice Cloning

Supported Models

Two Modes: Python API & Command-line

Python API Example:

CLI Example:

5. Voice Conversion

Python API Example:

CLI Example:

1. Overview

2. Installation

Step 1: Create a Conda Environment

Step 2: Install coqui-ai-TTS

3. Managing Models

List available models:

Default download location:

Change model storage location:

4. Voice Cloning

Supported Models

Two Modes: Python API & Command-line

Python API Example:

CLI Example:

5. Voice Conversion

Python API Example:

CLI Example:

Related articles