auracaster/piper

Fork 0

mirror of https://github.com/pstrueb/piper.git synced 2026-06-01 01:17:02 +00:00

T

Michael Hansen 3dfa161ba5 Add voice conversion script

2023-03-17 12:03:50 -05:00

lib

Upgrade to onnxruntime 1.13.1

2023-01-14 12:09:23 -06:00

src

Add voice conversion script

2023-03-17 12:03:50 -05:00

.gitignore

Add multispeaker

2023-01-05 21:47:08 -06:00

.projectile

Initial check in of Python training code

2022-11-11 11:02:11 -06:00

Dockerfile

Use manylinux to build for older GLIBC (2.28)

2023-02-17 16:52:52 -06:00

Dockerfile.dockerignore

Add docker build

2023-01-05 21:47:15 -06:00

Dockerfile.test

Use manylinux to build for older GLIBC (2.28)

2023-02-17 16:52:52 -06:00

Dockerfile.test.dockerignore

Use manylinux to build for older GLIBC (2.28)

2023-02-17 16:52:52 -06:00

LICENSE.md

Add license

2023-01-10 16:25:55 -06:00

Makefile

Use manylinux to build for older GLIBC (2.28)

2023-02-17 16:52:52 -06:00

README.md

Add missing languages from README

2023-02-28 14:42:30 -06:00

VERSION

Bump version

2023-01-14 12:09:40 -06:00

README.md

Larynx

A fast, local neural text to speech system.

echo 'Welcome to the world of speech synthesis!' | \
  ./larynx --model en-us-blizzard_lessac-medium.onnx --output_file welcome.wav

Voices

Download voices from the release.

Supported languages:

Catalan (ca)
Danish (da)
Dutch (nl)
French (fr)
German (de)
Italian (it)
Kazakh (kk)
Nepali (ne)
Norwegian (no)
Spanish (es)
Ukrainian (uk)
U.S. English (en-us)
Vietnamese (vi)

Purpose

Larynx is meant to sound good and run reasonably fast on the Raspberry Pi 4.

Voices are trained with VITS and exported to the onnxruntime.

Installation

Download a release:

amd64 (desktop Linux)
arm64 (Raspberry Pi 4)

If you want to build from source, see the Makefile and C++ source. Last tested with onnxruntime 1.13.1.

Usage

Download a voice and extract the .onnx and .onnx.json files
Run the larynx binary with text on standard input, --model /path/to/your-voice.onnx, and --output_file output.wav

For example:

echo 'Welcome to the world of speech synthesis!' | \
  ./larynx --model blizzard_lessac-medium.onnx --output_file welcome.wav

For multi-speaker models, use --speaker <number> to change speakers (default: 0).

See larynx --help for more options.

Training

See src/python

Start by creating a virtual environment:

cd larynx2/src/python
python3 -m venv .venv
source .venv/bin/activate
pip3 install --upgrade pip
pip3 install --upgrade wheel setuptools
pip3 install -r requirements.txt

Run the build_monotonic_align.sh script in the src/python directory to build the extension.

Ensure you have espeak-ng installed (sudo apt-get install espeak-ng).

Next, preprocess your dataset:

python3 -m larynx_train.preprocess \
  --language en-us \
  --input-dir /path/to/ljspeech/ \
  --output-dir /path/to/training_dir/ \
  --dataset-format ljspeech \
  --sample-rate 22050

Datasets must either be in the LJSpeech format or from Mimic Recording Studio (--dataset-format mycroft).

Finally, you can train:

python3 -m larynx_train \
    --dataset-dir /path/to/training_dir/ \
    --accelerator 'gpu' \
    --devices 1 \
    --batch-size 32 \
    --validation-split 0.05 \
    --num-test-examples 5 \
    --max_epochs 10000 \
    --precision 32

Training uses PyTorch Lightning. Run tensorboard --logdir /path/to/training_dir/lightning_logs to monitor. See python3 -m larynx_train --help for many additional options.

It is highly recommended to train with the following Dockerfile:

FROM nvcr.io/nvidia/pytorch:22.03-py3

RUN pip3 install \
    'pytorch-lightning'

ENV NUMBA_CACHE_DIR=.numba_cache

See the various infer_* and export_* scripts in src/python/larynx_train to test and export your voice from the checkpoint in lightning_logs. The dataset.jsonl file in your training directory can be used with python3 -m larynx_train.infer for quick testing:

head -n5 /path/to/training_dir/dataset.jsonl | \
  python3 -m larynx_train.infer \
    --checkpoint lightning_logs/path/to/checkpoint.ckpt \
    --sample-rate 22050 \
    --output-dir wavs

Running in Python

See src/python_run

Run scripts/setup.sh to create a virtual environment and install the requirements. Then run:

echo 'Welcome to the world of speech synthesis!' | scripts/larynx \
  --model /path/to/voice.onnx \
  --output_file welcome.wav

If you'd like to use a GPU, install the onnxruntime-gpu package:

.venv/bin/pip3 install onnxruntime-gpu

and then run scripts/larynx with the --cuda argument. You will need to have a functioning CUDA environment, such as what's available in NVIDIA's PyTorch containers.

Languages

C++ 72.9%

Python 18.6%

Jupyter Notebook 7.7%

CMake 0.3%

Shell 0.2%

Other 0.2%