myshell-ai / OpenVoice Use Case and Practical Effects

terry 07/09/2025
《myshell-ai / OpenVoice Use Case and Practical Effects》

Real-World Effectiveness:

1. What is OpenVoice?

OpenVoice is an open-source real-time voice cloning technology developed by MyShell AI in collaboration with MIT. It has the following features:

  • Tone Color Cloning: Accurately captures the tone, emotion, and rhythm characteristics of the reference audio.
  • Flexible Style Control: Allows control over emotion, accent, rhythm, pauses, pitch, etc., beyond simple replication.
  • Zero-shot Cross-Lingual Cloning: Can clone voices into new languages without requiring corresponding language training samples.
  • Efficient and Open-Source: The computational cost is much lower than commercial APIs, and OpenVoice V2 is open-source, released under the MIT license from April 2024, and available for free commercial use.

2. Quick Online Experience (No-Code Solution)

If you want to quickly get started without installation, you can use the online widget provided by MyShell:

  • Log in to MyShell.ai‘s Workshop or Maker platform.
  • Choose the “Voice Clone” tool.
  • Upload your reference audio (recommended to be clean and without background noise, ideally between 20 seconds and 5 minutes long).
  • Input the text you want the model to speak, and set language or style parameters.
  • The system will generate the cloned audio, which you can save and manage.

This approach is ideal for users who want to experience the tool or generate audio quickly without any coding.

3. Local Usage (For Developers/Researchers)

If you’re familiar with Python, Linux, and ML development, you can deploy OpenVoice locally:

Installation Steps:

  • Clone the repository: git clone https://github.com/myshell-ai/OpenVoice.git cd OpenVoice
  • Create a virtual environment and install dependencies: conda create -n openvoice python=3.9 conda activate openvoice pip install -e .
  • Download the model checkpoint:
    • For V1: Download checkpoints_1226.zip, extract to checkpoints/ directory.
    • For V2: Download checkpoints_v2_0417.zip, extract to checkpoints_v2/ directory, and install MeloTTS: pip install git+https://github.com/myshell-ai/MeloTTS.git python -m unidic download

Demo Usage:

  • OpenVoice V1:
    • demo_part1.ipynb: Demonstrates style control.
    • demo_part2.ipynb: Demonstrates cross-lingual cloning.
  • OpenVoice V2:
    • demo_part3.ipynb: Shows how to use V2, supporting multiple languages (English, Chinese, French, Japanese, Korean, Spanish).

4. Practical Examples

Example 1: Quick Online Test

  • Upload a 40-second audio, input a text like “Hello, this is a test of voice cloning.”
  • Choose language (Chinese or English), and set emotional or rhythm parameters.
  • Click “Generate”, and the system will output a newly synthesized audio that mimics your voice with the chosen style.

Example 2: Local Script Clone

from your_openvoice_module import VoiceService
service = VoiceService()

service.clone_voice(
    reference_speaker='my_voice.wav',
    text='你好,这是一段测试语音。',
    output_filename='cloned_output.wav',
    language='ZH',  # Chinese
    base_speaker_key='ZH-Default'
)
  • Use the demo_part*.ipynb method to easily generate speech matching your voice.
  • You can adjust the speed (e.g., speed=0.9) for a more natural effect.

OpenVoice V2 Simple Gradio WebUI

  1. Create the openvoice_webui.py script:
import gradio as gr
import torch
import os
from pathlib import Path
from openvoice import se_extractor
from openvoice.api import ToneColorConverter
from melo.api import TTS

# Set checkpoint paths
ckpt_converter = "checkpoints_v2/converter"
ckpt_tts = "checkpoints_v2/base_speakers/ZH"

# Initialize model
device = "cuda" if torch.cuda.is_available() else "cpu"
converter = ToneColorConverter(os.path.join(ckpt_converter, "config.json"), device=device)
converter.load_ckpt(os.path.join(ckpt_converter, "checkpoint.pth"))
tts_model = TTS(language="ZH", device=device)

def clone_voice(ref_audio, text, language="ZH"):
    # Extract tone color and generate cloned speech
    if ref_audio is None:
        return None
    tone_color = se_extractor.get_se(ref_audio, converter, target_dir="se_cache")
    tts_output_path = "tmp_tts.wav"
    tts_model.tts_to_file(text, tts_output_path, speaker_id=0)
    out_path = "output_cloned.wav"
    converter.convert(tts_output_path, ref_audio, tone_color, out_path, "@MyShell OpenVoice V2")
    return out_path

# Gradio interface
with gr.Blocks() as demo:
    gr.Markdown("# 🎙️ OpenVoice V2 WebUI")
    ref_audio = gr.Audio(label="Upload Reference Audio", type="filepath")
    text_input = gr.Textbox(label="Text to Speak", value="你好,这是一段语音克隆测试。")
    lang_choice = gr.Dropdown(["ZH", "EN", "JP", "FR", "ES", "KR"], label="Language", value="ZH")
    btn = gr.Button("Generate")
    output_audio = gr.Audio(label="Cloned Output", type="filepath")

    btn.click(fn=clone_voice, inputs=[ref_audio, text_input, lang_choice], outputs=output_audio)

demo.launch(share=True)
  1. Usage:
  • Make sure to download OpenVoice V2 checkpoint and place them under checkpoints_v2/.
  • Install dependencies: pip install gradio pip install git+https://github.com/myshell-ai/MeloTTS.git python -m unidic download
  • Run: python openvoice_webui.py
  • Open a browser, enter text, upload audio, and get cloned speech.

Real-World Effectiveness:

I tested two methods:

  1. Using MeloTTS to generate a voice clone with a different tone.
  2. Using another software to generate a voice file and clone it using my own recording.

The overall results were unsatisfactory. I tried increasing the target audio file length and matching the sample rates of both the original and target files, but the results remained poor. Additionally, MeloTTS has byte size limits, which can be frustrating.