Merge branch 'rhasspy:master' into master

This commit is contained in:
Mateo Cedillo
2023-07-27 22:04:26 -05:00
committed by GitHub
24 changed files with 4513 additions and 218 deletions
+1
View File
@@ -1,4 +1,5 @@
*
!VERSION
!Makefile
!src/cpp/
!local/en-us/lessac/low/en-us-lessac-low.onnx
+1 -1
View File
@@ -39,7 +39,7 @@ RUN mkdir -p "lib/Linux-$(uname -m)/piper_phonemize" && \
tar -C "lib/Linux-$(uname -m)/piper_phonemize" -xzvf -
# Build piper binary
COPY Makefile ./
COPY VERSION Makefile ./
COPY src/cpp/ ./src/cpp/
RUN make
+3 -1
View File
@@ -1,6 +1,8 @@
.PHONY: piper clean
LIB_DIR := lib/Linux-$(shell uname -m)
VERSION := $(cat VERSION)
DOCKER_PLATFORM ?= linux/amd64,linux/arm64,linux/arm/v7
piper:
mkdir -p build
@@ -11,4 +13,4 @@ clean:
rm -rf build/ dist/
docker:
docker buildx build . --platform 'linux/amd64,linux/arm64,linux/arm/v7' --output 'type=local,dest=dist'
docker buildx build . --platform '$(DOCKER_PLATFORM)' --output 'type=local,dest=dist'
+76 -34
View File
@@ -5,7 +5,7 @@ Piper is used in a [variety of projects](#people-using-piper).
``` sh
echo 'Welcome to the world of speech synthesis!' | \
./piper --model en-us-blizzard_lessac-medium.onnx --output_file welcome.wav
./piper --model en_US-lessac-medium.onnx --output_file welcome.wav
```
[Listen to voice samples](https://rhasspy.github.io/piper-samples) and check out a [video tutorial by Thorsten Müller](https://youtu.be/rjq5eZoWWSo)
@@ -18,39 +18,47 @@ Voices are trained with [VITS](https://github.com/jaywalnut310/vits/) and export
Our goal is to support Home Assistant and the [Year of Voice](https://www.home-assistant.io/blog/2022/12/20/year-of-voice/).
[Download voices](https://github.com/rhasspy/piper/releases/tag/v0.0.2) for the supported languages:
[Download voices](https://huggingface.co/rhasspy/piper-voices/tree/v1.0.0) for the supported languages:
* Catalan (ca)
* Danish (da)
* German (de)
* British English (en-gb)
* U.S. English (en-us)
* Spanish (es)
* Finnish (fi)
* French (fr)
* Greek (el-gr)
* Icelandic (is)
* Italian (it)
* Kazakh (kk)
* Nepali (ne)
* Dutch (nl)
* Norwegian (no)
* Polish (pl)
* Brazilian Portuguese (pt-br)
* Russian (ru)
* Swedish (sv-se)
* Ukrainian (uk)
* Vietnamese (vi)
* Chinese (zh-cn)
* Catalan (ca_ES)
* Danish (da_DK)
* German (de_DE)
* English (en_GB, en_US)
* Spanish (es_ES, es_MX)
* Finnish (fi_FI)
* French (fr_FR)
* Greek (el_GR)
* Icelandic (is_IS)
* Italian (it_IT)
* Georgian (ka_GE)
* Kazakh (kk_KZ)
* Nepali (ne_NP)
* Dutch (nl_BE, nl_NL)
* Norwegian (no_NO)
* Polish (pl_PL)
* Portuguese (pt_BR)
* Russian (ru_RU)
* Swedish (sv_SE)
* Swahili (sw_CD)
* Ukrainian (uk_UA)
* Vietnamese (vi_VN)
* Chinese (zh_CN)
You will need two files per voice:
1. A `.onnx` model file, such as [`en_US-lessac-medium.onnx`](https://huggingface.co/rhasspy/piper-voices/resolve/v1.0.0/en/en_US/lessac/medium/en_US-lessac-medium.onnx)
2. A `.onnx.json` config file, such as [`en_US-lessac-medium.onnx.json`](https://huggingface.co/rhasspy/piper-voices/resolve/v1.0.0/en/en_US/lessac/medium/en_US-lessac-medium.onnx.json)
The `MODEL_CARD` file for each voice contains important licensing information. Piper is intended for text to speech research, and does not impose any additional restrictions on voice models. Some voices may have restrictive licenses, however, so please review them carefully!
## Installation
Download a release:
You can [run Piper with Python](#running-in-python) or download a binary release:
* [amd64](https://github.com/rhasspy/piper/releases/download/v1.0.0/piper_amd64.tar.gz) (64-bit desktop Linux)
* [arm64](https://github.com/rhasspy/piper/releases/download/v1.0.0/piper_arm64.tar.gz) (64-bit Raspberry Pi 4)
* [armv7](https://github.com/rhasspy/piper/releases/download/v1.0.0/piper_armv7.tar.gz) (32-bit Raspberry Pi 3/4)
* [amd64](https://github.com/rhasspy/piper/releases/download/v1.1.0/piper_amd64.tar.gz) (64-bit desktop Linux)
* [arm64](https://github.com/rhasspy/piper/releases/download/v1.1.0/piper_arm64.tar.gz) (64-bit Raspberry Pi 4)
* [armv7](https://github.com/rhasspy/piper/releases/download/v1.1.0/piper_armv7.tar.gz) (32-bit Raspberry Pi 3/4)
If you want to build from source, see the [Makefile](Makefile) and [C++ source](src/cpp).
You must download and extract [piper-phonemize](https://github.com/rhasspy/piper-phonemize) to `lib/Linux-$(uname -m)/piper_phonemize` before building.
@@ -66,7 +74,7 @@ For example:
``` sh
echo 'Welcome to the world of speech synthesis!' | \
./piper --model en-us-lessac-medium.onnx --output_file welcome.wav
./piper --model en_US-lessac-medium.onnx --output_file welcome.wav
```
For multi-speaker models, use `--speaker <number>` to change speakers (default: 0).
@@ -74,6 +82,32 @@ For multi-speaker models, use `--speaker <number>` to change speakers (default:
See `piper --help` for more options.
### JSON Input
The `piper` executable can accept JSON input when using the `--json-input` flag. Each line of input must be a JSON object with `text` field. For example:
``` json
{ "text": "First sentence to speak." }
{ "text": "Second sentence to speak." }
```
Optional fields include:
* `speaker` - string
* Name of the speaker to use from `speaker_id_map` in config (multi-speaker voices only)
* `speaker_id` - number
* Id of speaker to use from 0 to number of speakers - 1 (multi-speaker voices only, overrides "speaker")
* `output_file` - string
* Path to output WAV file
The following example writes two sentences with different speakers to different files:
``` json
{ "text": "First speaker.", "speaker_id": 0, "output_file": "/tmp/speaker_0.wav" }
{ "text": "Second speaker.", "speaker_id": 1, "output_file": "/tmp/speaker_1.wav" }
```
## People using Piper
Piper has been used in the following projects/papers:
@@ -84,7 +118,7 @@ Piper has been used in the following projects/papers:
* [Image Captioning for the Visually Impaired and Blind: A Recipe for Low-Resource Languages](https://www.techrxiv.org/articles/preprint/Image_Captioning_for_the_Visually_Impaired_and_Blind_A_Recipe_for_Low-Resource_Languages/22133894)
* [Open Voice Operating System](https://github.com/OpenVoiceOS/ovos-tts-plugin-piper)
* [JetsonGPT](https://github.com/shahizat/jetsonGPT)
* [LocalAI](https://github.com/go-skynet/LocalAI)
## Training
@@ -97,14 +131,22 @@ Pretrained checkpoints are available on [Hugging Face](https://huggingface.co/da
See [src/python_run](src/python_run)
Run `scripts/setup.sh` to create a virtual environment and install the requirements. Then run:
Install with `pip`:
``` sh
echo 'Welcome to the world of speech synthesis!' | scripts/piper \
--model /path/to/voice.onnx \
pip install piper-tts
```
and then run:
``` sh
echo 'Welcome to the world of speech synthesis!' | piper \
--model en_US-lessac-medium \
--output_file welcome.wav
```
This will automatically download [voice files](https://huggingface.co/rhasspy/piper-voices/tree/v1.0.0) the first time they're used. Use `--data-dir` and `--download-dir` to adjust where voices are found/downloaded.
If you'd like to use a GPU, install the `onnxruntime-gpu` package:
@@ -112,5 +154,5 @@ If you'd like to use a GPU, install the `onnxruntime-gpu` package:
.venv/bin/pip3 install onnxruntime-gpu
```
and then run `scripts/piper` with the `--cuda` argument. You will need to have a functioning CUDA environment, such as what's available in [NVIDIA's PyTorch containers](https://catalog.ngc.nvidia.com/orgs/nvidia/containers/pytorch).
and then run `piper` with the `--cuda` argument. You will need to have a functioning CUDA environment, such as what's available in [NVIDIA's PyTorch containers](https://catalog.ngc.nvidia.com/orgs/nvidia/containers/pytorch).
+1 -1
View File
@@ -1 +1 @@
1.0.0
1.1.0
+5
View File
@@ -0,0 +1,5 @@
{"phoneme_ids":[1,0,64,0,45,0,23,0,122,0,33,0,96,0,14,0,122,0,120,0,79,0,8,0,64,0,37,0,26,0,120,0,61,0,96,0,3,0,79,0,96,0,79,0,26,0,75,0,14,0,92,0,79,0,26,0,120,0,79,0,26,0,3,0,22,0,14,0,122,0,25,0,120,0,100,0,30,0,3,0,17,0,14,0,25,0,75,0,14,0,75,0,14,0,92,0,79,0,26,0,17,0,120,0,14,0,3,0,34,0,18,0,22,0,121,0,14,0,3,0,31,0,120,0,74,0,31,0,3,0,15,0,33,0,75,0,100,0,32,0,75,0,14,0,92,0,79,0,26,0,17,0,120,0,14,0,3,0,22,0,14,0,26,0,31,0,79,0,25,0,14,0,31,0,120,0,79,0,3,0,34,0,61,0,3,0,23,0,79,0,92,0,79,0,75,0,25,0,14,0,31,0,120,0,79,0,22,0,75,0,14,0,3,0,25,0,61,0,22,0,17,0,14,0,26,0,120,0,14,0,3,0,64,0,18,0,24,0,120,0,39,0,26,0,3,0,34,0,61,0,3,0,79,0,96,0,120,0,79,0,23,0,3,0,32,0,14,0,22,0,19,0,120,0,79,0,3,0,30,0,39,0,26,0,23,0,24,0,18,0,92,0,21,0,26,0,120,0,74,0,26,0,3,0,15,0,74,0,30,0,3,0,22,0,120,0,14,0,22,0,3,0,96,0,61,0,23,0,24,0,74,0,26,0,17,0,120,0,61,0,3,0,64,0,45,0,92,0,42,0,26,0,17,0,120,0,42,0,122,0,3,0,25,0,18,0,32,0,18,0,27,0,92,0,27,0,75,0,27,0,108,0,120,0,74,0,23,0,3,0,15,0,74,0,30,0,3,0,27,0,75,0,120,0,14,0,22,0,17,0,79,0,30,0,10,0,2],"phonemes":["ɟ","œ","k","ː","u","ʃ","a","ː","ˈ","ɯ",",","ɟ","y","n","ˈ","ɛ","ʃ"," ","ɯ","ʃ","ɯ","n","ɫ","a","ɾ","ɯ","n","ˈ","ɯ","n"," ","j","a","ː","m","ˈ","ʊ","r"," ","d","a","m","ɫ","a","ɫ","a","ɾ","ɯ","n","d","ˈ","a"," ","v","e","j","ˌ","a"," ","s","ˈ","ɪ","s"," ","b","u","ɫ","ʊ","t","ɫ","a","ɾ","ɯ","n","d","ˈ","a"," ","j","a","n","s","ɯ","m","a","s","ˈ","ɯ"," ","v","ɛ"," ","k","ɯ","ɾ","ɯ","ɫ","m","a","s","ˈ","ɯ","j","ɫ","a"," ","m","ɛ","j","d","a","n","ˈ","a"," ","ɟ","e","l","ˈ","æ","n"," ","v","ɛ"," ","ɯ","ʃ","ˈ","ɯ","k"," ","t","a","j","f","ˈ","ɯ"," ","r","æ","n","k","l","e","ɾ","i","n","ˈ","ɪ","n"," ","b","ɪ","r"," ","j","ˈ","a","j"," ","ʃ","ɛ","k","l","ɪ","n","d","ˈ","ɛ"," ","ɟ","œ","ɾ","ø","n","d","ˈ","ø","ː"," ","m","e","t","e","o","ɾ","o","ɫ","o","ʒ","ˈ","ɪ","k"," ","b","ɪ","r"," ","o","ɫ","ˈ","a","j","d","ɯ","r","."],"processed_text":"Gökkuşağı, güneş ışınlarının yağmur damlalarında veya sis bulutlarında yansıması ve kırılmasıyla meydana gelen ve ışık tayfı renklerinin bir yay şeklinde göründüğü meteorolojik bir olaydır.","text":"Gökkuşağı, güneş ışınlarının yağmur damlalarında veya sis bulutlarında yansıması ve kırılmasıyla meydana gelen ve ışık tayfı renklerinin bir yay şeklinde göründüğü meteorolojik bir olaydır."}
{"phoneme_ids":[1,0,64,0,45,0,23,0,122,0,33,0,96,0,14,0,122,0,79,0,26,0,17,0,14,0,23,0,120,0,74,0,3,0,30,0,39,0,26,0,23,0,24,0,120,0,61,0,30,0,3,0,15,0,74,0,30,0,3,0,31,0,28,0,120,0,61,0,23,0,32,0,30,0,100,0,25,0,3,0,27,0,75,0,100,0,96,0,32,0,33,0,92,0,120,0,100,0,30,0,10,0,2],"phonemes":["ɟ","œ","k","ː","u","ʃ","a","ː","ɯ","n","d","a","k","ˈ","ɪ"," ","r","æ","n","k","l","ˈ","ɛ","r"," ","b","ɪ","r"," ","s","p","ˈ","ɛ","k","t","r","ʊ","m"," ","o","ɫ","ʊ","ʃ","t","u","ɾ","ˈ","ʊ","r","."],"processed_text":"Gökkuşağındaki renkler bir spektrum oluşturur.","text":"Gökkuşağındaki renkler bir spektrum oluşturur."}
{"phoneme_ids":[1,0,32,0,21,0,28,0,120,0,74,0,23,0,3,0,15,0,74,0,30,0,3,0,64,0,45,0,23,0,122,0,33,0,96,0,14,0,122,0,120,0,79,0,3,0,23,0,120,0,79,0,30,0,25,0,79,0,38,0,79,0,8,0,32,0,33,0,92,0,100,0,26,0,17,0,108,0,120,0,100,0,8,0,31,0,14,0,92,0,120,0,79,0,8,0,22,0,18,0,96,0,120,0,74,0,24,0,8,0,25,0,14,0,122,0,34,0,120,0,74,0,8,0,75,0,14,0,17,0,108,0,21,0,34,0,120,0,61,0,30,0,32,0,3,0,34,0,61,0,3,0,25,0,120,0,54,0,30,0,3,0,30,0,39,0,26,0,23,0,24,0,18,0,92,0,74,0,26,0,17,0,120,0,39,0,26,0,3,0,25,0,61,0,22,0,17,0,14,0,26,0,120,0,14,0,3,0,64,0,18,0,24,0,120,0,39,0,26,0,3,0,15,0,74,0,30,0,3,0,30,0,120,0,39,0,26,0,23,0,3,0,31,0,79,0,92,0,14,0,31,0,79,0,26,0,120,0,14,0,3,0,31,0,14,0,20,0,120,0,74,0,28,0,3,0,15,0,74,0,30,0,3,0,34,0,18,0,22,0,121,0,14,0,3,0,17,0,14,0,20,0,120,0,14,0,3,0,19,0,120,0,14,0,38,0,75,0,14,0,3,0,14,0,22,0,26,0,120,0,79,0,3,0,25,0,61,0,30,0,23,0,61,0,38,0,24,0,120,0,74,0,3,0,14,0,30,0,23,0,75,0,14,0,30,0,17,0,120,0,14,0,26,0,3,0,21,0,15,0,14,0,92,0,18,0,32,0,122,0,120,0,74,0,30,0,10,0,2],"phonemes":["t","i","p","ˈ","ɪ","k"," ","b","ɪ","r"," ","ɟ","œ","k","ː","u","ʃ","a","ː","ˈ","ɯ"," ","k","ˈ","ɯ","r","m","ɯ","z","ɯ",",","t","u","ɾ","ʊ","n","d","ʒ","ˈ","ʊ",",","s","a","ɾ","ˈ","ɯ",",","j","e","ʃ","ˈ","ɪ","l",",","m","a","ː","v","ˈ","ɪ",",","ɫ","a","d","ʒ","i","v","ˈ","ɛ","r","t"," ","v","ɛ"," ","m","ˈ","ɔ","r"," ","r","æ","n","k","l","e","ɾ","ɪ","n","d","ˈ","æ","n"," ","m","ɛ","j","d","a","n","ˈ","a"," ","ɟ","e","l","ˈ","æ","n"," ","b","ɪ","r"," ","r","ˈ","æ","n","k"," ","s","ɯ","ɾ","a","s","ɯ","n","ˈ","a"," ","s","a","h","ˈ","ɪ","p"," ","b","ɪ","r"," ","v","e","j","ˌ","a"," ","d","a","h","ˈ","a"," ","f","ˈ","a","z","ɫ","a"," ","a","j","n","ˈ","ɯ"," ","m","ɛ","r","k","ɛ","z","l","ˈ","ɪ"," ","a","r","k","ɫ","a","r","d","ˈ","a","n"," ","i","b","a","ɾ","e","t","ː","ˈ","ɪ","r","."],"processed_text":"Tipik bir gökkuşağı kırmızı, turuncu, sarı, yeşil, mavi, lacivert ve mor renklerinden meydana gelen bir renk sırasına sahip bir veya daha fazla aynı merkezli arklardan ibarettir.","text":"Tipik bir gökkuşağı kırmızı, turuncu, sarı, yeşil, mavi, lacivert ve mor renklerinden meydana gelen bir renk sırasına sahip bir veya daha fazla aynı merkezli arklardan ibarettir."}
{"phoneme_ids":[1,0,28,0,21,0,108,0,120,0,14,0,25,0,14,0,75,0,79,0,3,0,20,0,14,0,31,0,32,0,120,0,14,0,3,0,22,0,120,0,14,0,122,0,79,0,38,0,3,0,96,0,27,0,19,0,45,0,92,0,120,0,61,0,3,0,32,0,96,0,14,0,15,0,33,0,17,0,108,0,120,0,14,0,23,0,3,0,64,0,37,0,34,0,39,0,26,0,17,0,120,0,74,0,10,0,2],"phonemes":["p","i","ʒ","ˈ","a","m","a","ɫ","ɯ"," ","h","a","s","t","ˈ","a"," ","j","ˈ","a","ː","ɯ","z"," ","ʃ","o","f","œ","ɾ","ˈ","ɛ"," ","t","ʃ","a","b","u","d","ʒ","ˈ","a","k"," ","ɟ","y","v","æ","n","d","ˈ","ɪ","."],"processed_text":"Pijamalı hasta yağız şoföre çabucak güvendi.","text":"Pijamalı hasta yağız şoföre çabucak güvendi."}
{"phoneme_ids":[1,0,120,0,45,0,23,0,42,0,38,0,3,0,14,0,108,0,120,0,14,0,26,0,3,0,20,0,120,0,14,0,28,0,31,0,61,0,3,0,17,0,42,0,96,0,32,0,120,0,42,0,3,0,22,0,120,0,14,0,34,0,30,0,100,0,25,0,8,0,27,0,17,0,108,0,14,0,122,0,120,0,79,0,3,0,19,0,120,0,39,0,24,0,32,0,96,0,3,0,64,0,21,0,15,0,120,0,74,0,10,0,2],"phonemes":["ˈ","œ","k","ø","z"," ","a","ʒ","ˈ","a","n"," ","h","ˈ","a","p","s","ɛ"," ","d","ø","ʃ","t","ˈ","ø"," ","j","ˈ","a","v","r","ʊ","m",",","o","d","ʒ","a","ː","ˈ","ɯ"," ","f","ˈ","æ","l","t","ʃ"," ","ɟ","i","b","ˈ","ɪ","."],"processed_text":"Öküz ajan hapse düştü yavrum, ocağı felç gibi.","text":"Öküz ajan hapse düştü yavrum, ocağı felç gibi."}
+5
View File
@@ -0,0 +1,5 @@
Gökkuşağı, güneş ışınlarının yağmur damlalarında veya sis bulutlarında yansıması ve kırılmasıyla meydana gelen ve ışık tayfı renklerinin bir yay şeklinde göründüğü meteorolojik bir olaydır.
Gökkuşağındaki renkler bir spektrum oluşturur.
Tipik bir gökkuşağı kırmızı, turuncu, sarı, yeşil, mavi, lacivert ve mor renklerinden meydana gelen bir renk sırasına sahip bir veya daha fazla aynı merkezli arklardan ibarettir.
Pijamalı hasta yağız şoföre çabucak güvendi.
Öküz ajan hapse düştü yavrum, ocağı felç gibi.
+4
View File
@@ -5,6 +5,8 @@ project(piper C CXX)
find_package(PkgConfig)
pkg_check_modules(SPDLOG REQUIRED spdlog)
file(READ "${CMAKE_CURRENT_LIST_DIR}/../../VERSION" piper_version)
set(CMAKE_CXX_STANDARD 17)
set(CMAKE_CXX_STANDARD_REQUIRED ON)
@@ -35,3 +37,5 @@ target_include_directories(piper PUBLIC
target_compile_options(piper PUBLIC
${SPDLOG_CFLAGS_OTHER})
target_compile_definitions(piper PUBLIC _PIPER_VERSION=${piper_version})
+31 -11
View File
@@ -70,9 +70,10 @@ struct RunConfig {
// stdin input is lines of JSON instead of text with format:
// {
// "text": "...", (required)
// "text": str, (required)
// "speaker_id": int, (optional)
// "output_file": "...", (optional)
// "speaker": str, (optional)
// "output_file": str, (optional)
// }
bool jsonInput = false;
};
@@ -194,7 +195,7 @@ int main(int argc, char *argv[]) {
while (getline(cin, line)) {
auto outputType = runConfig.outputType;
auto speakerId = voice.synthesisConfig.speakerId;
std::optional<filesystem::path> outputPath;
std::optional<filesystem::path> maybeOutputPath = runConfig.outputPath;
if (runConfig.jsonInput) {
// Each line is a JSON object
@@ -206,7 +207,7 @@ int main(int argc, char *argv[]) {
if (lineRoot.contains("output_file")) {
// Override output WAV file path
outputType = OUTPUT_FILE;
outputPath =
maybeOutputPath =
filesystem::path(lineRoot["output_file"].get<std::string>());
}
@@ -214,6 +215,16 @@ int main(int argc, char *argv[]) {
// Override speaker id
voice.synthesisConfig.speakerId =
lineRoot["speaker_id"].get<piper::SpeakerId>();
} else if (lineRoot.contains("speaker")) {
// Resolve to id using speaker id map
auto speakerName = lineRoot["speaker"].get<std::string>();
if ((voice.modelConfig.speakerIdMap) &&
(voice.modelConfig.speakerIdMap->count(speakerName) > 0)) {
voice.synthesisConfig.speakerId =
(*voice.modelConfig.speakerIdMap)[speakerName];
} else {
spdlog::warn("No speaker named: {}", speakerName);
}
}
}
@@ -227,14 +238,20 @@ int main(int argc, char *argv[]) {
// Generate path using timestamp
stringstream outputName;
outputName << timestamp << ".wav";
outputPath = runConfig.outputPath.value();
outputPath->append(outputName.str());
filesystem::path outputPath = runConfig.outputPath.value();
outputPath.append(outputName.str());
// Output audio to automatically-named WAV file in a directory
ofstream audioFile(outputPath->string(), ios::binary);
ofstream audioFile(outputPath.string(), ios::binary);
piper::textToWavFile(piperConfig, voice, line, audioFile, result);
cout << outputPath->string() << endl;
cout << outputPath.string() << endl;
} else if (outputType == OUTPUT_FILE) {
if (!maybeOutputPath || maybeOutputPath->empty()) {
throw runtime_error("No output path provided");
}
filesystem::path outputPath = maybeOutputPath.value();
if (!runConfig.jsonInput) {
// Read all of standard input before synthesizing.
// Otherwise, we would overwrite the output file for each line.
@@ -248,9 +265,9 @@ int main(int argc, char *argv[]) {
}
// Output audio to WAV file
ofstream audioFile(outputPath->string(), ios::binary);
ofstream audioFile(outputPath.string(), ios::binary);
piper::textToWavFile(piperConfig, voice, line, audioFile, result);
cout << outputPath->string() << endl;
cout << outputPath.string() << endl;
} else if (outputType == OUTPUT_STDOUT) {
// Output WAV to stdout
piper::textToWavFile(piperConfig, voice, line, cout, result);
@@ -368,7 +385,7 @@ void printUsage(char *argv[]) {
<< endl;
cerr << " --noise_w NUM phoneme width noise (default: 0.8)"
<< endl;
cerr << " --silence_seconds NUM seconds of silence after each "
cerr << " --sentence_silence NUM seconds of silence after each "
"sentence (default: 0.2)"
<< endl;
cerr << " --espeak_data DIR path to espeak-ng data directory"
@@ -444,6 +461,9 @@ void parseArgs(int argc, char *argv[], RunConfig &runConfig) {
runConfig.tashkeelModelPath = filesystem::path(argv[++i]);
} else if (arg == "--json_input" || arg == "--json-input") {
runConfig.jsonInput = true;
} else if (arg == "--version") {
std::cout << piper::getVersion() << std::endl;
exit(0);
} else if (arg == "--debug") {
// Set DEBUG logging
spdlog::set_level(spdlog::level::debug);
+26
View File
@@ -16,11 +16,24 @@
namespace piper {
#ifdef _PIPER_VERSION
// https://stackoverflow.com/questions/47346133/how-to-use-a-define-inside-a-format-string
#define _STR(x) #x
#define STR(x) _STR(x)
const std::string VERSION = STR(_PIPER_VERSION);
#else
const std::string VERSION = "";
#endif
// Maximum value for 16-bit signed WAV sample
const float MAX_WAV_VALUE = 32767.0f;
const std::string instanceName{"piper"};
std::string getVersion() {
return VERSION;
}
// True if the string is a single UTF-8 codepoint
bool isSingleCodepoint(std::string s) {
return utf8::distance(s.begin(), s.end()) == 1;
@@ -163,6 +176,19 @@ void parseModelConfig(json &configRoot, ModelConfig &modelConfig) {
modelConfig.numSpeakers = configRoot["num_speakers"].get<SpeakerId>();
if (configRoot.contains("speaker_id_map")) {
if (!modelConfig.speakerIdMap) {
modelConfig.speakerIdMap.emplace();
}
auto speakerIdMapValue = configRoot["speaker_id_map"];
for (auto &speakerItem : speakerIdMapValue.items()) {
std::string speakerName = speakerItem.key();
(*modelConfig.speakerIdMap)[speakerName] =
speakerItem.value().get<SpeakerId>();
}
}
} /* parseModelConfig */
void initialize(PiperConfig &config) {
+6
View File
@@ -61,6 +61,9 @@ struct SynthesisConfig {
struct ModelConfig {
int numSpeakers;
// speaker name -> id
std::optional<std::map<std::string, SpeakerId>> speakerIdMap;
};
struct ModelSession {
@@ -86,6 +89,9 @@ struct Voice {
ModelSession session;
};
// Get version of Piper
std::string getVersion();
// Must be called before using textTo* functions
void initialize(PiperConfig &config);
+3
View File
@@ -0,0 +1,3 @@
build/
dist/
*.egg-info/
+2
View File
@@ -0,0 +1,2 @@
include requirements.txt
include piper/voices.json
+4 -146
View File
@@ -1,147 +1,5 @@
import io
import json
import logging
import wave
from dataclasses import dataclass
from pathlib import Path
from typing import List, Mapping, Optional, Sequence, Union
from .voice import PiperVoice
import numpy as np
import onnxruntime
from espeak_phonemizer import Phonemizer
_LOGGER = logging.getLogger(__name__)
_BOS = "^"
_EOS = "$"
_PAD = "_"
@dataclass
class PiperConfig:
num_symbols: int
num_speakers: int
sample_rate: int
espeak_voice: str
length_scale: float
noise_scale: float
noise_w: float
phoneme_id_map: Mapping[str, Sequence[int]]
class Piper:
def __init__(
self,
model_path: Union[str, Path],
config_path: Optional[Union[str, Path]] = None,
use_cuda: bool = False,
):
if config_path is None:
config_path = f"{model_path}.json"
self.config = load_config(config_path)
self.phonemizer = Phonemizer(self.config.espeak_voice)
self.model = onnxruntime.InferenceSession(
str(model_path),
sess_options=onnxruntime.SessionOptions(),
providers=["CPUExecutionProvider"]
if not use_cuda
else ["CUDAExecutionProvider"],
)
def synthesize(
self,
text: str,
speaker_id: Optional[int] = None,
length_scale: Optional[float] = None,
noise_scale: Optional[float] = None,
noise_w: Optional[float] = None,
) -> bytes:
"""Synthesize WAV audio from text."""
if length_scale is None:
length_scale = self.config.length_scale
if noise_scale is None:
noise_scale = self.config.noise_scale
if noise_w is None:
noise_w = self.config.noise_w
phonemes_str = self.phonemizer.phonemize(text)
phonemes = [_BOS] + list(phonemes_str)
phoneme_ids: List[int] = []
for phoneme in phonemes:
if phoneme in self.config.phoneme_id_map:
phoneme_ids.extend(self.config.phoneme_id_map[phoneme])
phoneme_ids.extend(self.config.phoneme_id_map[_PAD])
else:
_LOGGER.warning("No id for phoneme: %s", phoneme)
phoneme_ids.extend(self.config.phoneme_id_map[_EOS])
phoneme_ids_array = np.expand_dims(np.array(phoneme_ids, dtype=np.int64), 0)
phoneme_ids_lengths = np.array([phoneme_ids_array.shape[1]], dtype=np.int64)
scales = np.array(
[noise_scale, length_scale, noise_w],
dtype=np.float32,
)
if (self.config.num_speakers > 1) and (speaker_id is None):
# Default speaker
speaker_id = 0
sid = None
if speaker_id is not None:
sid = np.array([speaker_id], dtype=np.int64)
# Synthesize through Onnx
audio = self.model.run(
None,
{
"input": phoneme_ids_array,
"input_lengths": phoneme_ids_lengths,
"scales": scales,
"sid": sid,
},
)[0].squeeze((0, 1))
audio = audio_float_to_int16(audio.squeeze())
# Convert to WAV
with io.BytesIO() as wav_io:
wav_file: wave.Wave_write = wave.open(wav_io, "wb")
with wav_file:
wav_file.setframerate(self.config.sample_rate)
wav_file.setsampwidth(2)
wav_file.setnchannels(1)
wav_file.writeframes(audio.tobytes())
return wav_io.getvalue()
def load_config(config_path: Union[str, Path]) -> PiperConfig:
with open(config_path, "r", encoding="utf-8") as config_file:
config_dict = json.load(config_file)
inference = config_dict.get("inference", {})
return PiperConfig(
num_symbols=config_dict["num_symbols"],
num_speakers=config_dict["num_speakers"],
sample_rate=config_dict["audio"]["sample_rate"],
espeak_voice=config_dict["espeak"]["voice"],
noise_scale=inference.get("noise_scale", 0.667),
length_scale=inference.get("length_scale", 1.0),
noise_w=inference.get("noise_w", 0.8),
phoneme_id_map=config_dict["phoneme_id_map"],
)
def audio_float_to_int16(
audio: np.ndarray, max_wav_value: float = 32767.0
) -> np.ndarray:
"""Normalize audio and convert to int16 range"""
audio_norm = audio * (max_wav_value / max(0.01, np.max(np.abs(audio))))
audio_norm = np.clip(audio_norm, -max_wav_value, max_wav_value)
audio_norm = audio_norm.astype("int16")
return audio_norm
__all__ = [
"PiperVoice",
]
+101 -22
View File
@@ -2,10 +2,12 @@ import argparse
import logging
import sys
import time
from functools import partial
import wave
from pathlib import Path
from typing import Any, Dict
from . import Piper
from . import PiperVoice
from .download import ensure_voice_exists, find_voice, get_voices
_FILE = Path(__file__)
_DIR = _FILE.parent
@@ -17,33 +19,108 @@ def main() -> None:
parser.add_argument("-m", "--model", required=True, help="Path to Onnx model file")
parser.add_argument("-c", "--config", help="Path to model config file")
parser.add_argument(
"-f", "--output_file", help="Path to output WAV file (default: stdout)"
"-f",
"--output-file",
"--output_file",
help="Path to output WAV file (default: stdout)",
)
parser.add_argument(
"-d", "--output_dir", help="Path to output directory (default: cwd)"
"-d",
"--output-dir",
"--output_dir",
help="Path to output directory (default: cwd)",
)
parser.add_argument(
"--output-raw",
"--output_raw",
action="store_true",
help="Stream raw audio to stdout",
)
#
parser.add_argument("-s", "--speaker", type=int, help="Id of speaker (default: 0)")
parser.add_argument("--noise-scale", type=float, help="Generator noise")
parser.add_argument("--length-scale", type=float, help="Phoneme length")
parser.add_argument("--noise-w", type=float, help="Phoneme width noise")
parser.add_argument(
"--length-scale", "--length_scale", type=float, help="Phoneme length"
)
parser.add_argument(
"--noise-scale", "--noise_scale", type=float, help="Generator noise"
)
parser.add_argument(
"--noise-w", "--noise_w", type=float, help="Phoneme width noise"
)
#
parser.add_argument("--cuda", action="store_true", help="Use GPU")
#
parser.add_argument(
"--sentence-silence",
"--sentence_silence",
type=float,
default=0.0,
help="Seconds of silence after each sentence",
)
#
parser.add_argument(
"--data-dir",
"--data_dir",
action="append",
default=[str(Path.cwd())],
help="Data directory to check for downloaded models (default: current directory)",
)
parser.add_argument(
"--download-dir",
"--download_dir",
help="Directory to download voices into (default: first data dir)",
)
#
parser.add_argument(
"--debug", action="store_true", help="Print DEBUG messages to console"
)
args = parser.parse_args()
logging.basicConfig(level=logging.DEBUG if args.debug else logging.INFO)
_LOGGER.debug(args)
voice = Piper(args.model, config_path=args.config, use_cuda=args.cuda)
synthesize = partial(
voice.synthesize,
speaker_id=args.speaker,
length_scale=args.length_scale,
noise_scale=args.noise_scale,
noise_w=args.noise_w,
)
if not args.download_dir:
# Download to first data directory by default
args.download_dir = args.data_dir[0]
if args.output_dir:
# Download voice if file doesn't exist
model_path = Path(args.model)
if not model_path.exists():
# Load voice info
voices_info = get_voices()
# Resolve aliases for backwards compatibility with old voice names
aliases_info: Dict[str, Any] = {}
for voice_info in voices_info.values():
for voice_alias in voice_info.get("aliases", []):
aliases_info[voice_alias] = {"_is_alias": True, **voice_info}
voices_info.update(aliases_info)
ensure_voice_exists(args.model, args.data_dir, args.download_dir, voices_info)
args.model, args.config = find_voice(args.model, args.data_dir)
# Load voice
voice = PiperVoice.load(args.model, config_path=args.config, use_cuda=args.cuda)
synthesize_args = {
"speaker_id": args.speaker,
"length_scale": args.length_scale,
"noise_scale": args.noise_scale,
"noise_w": args.noise_w,
"sentence_silence": args.sentence_silence,
}
if args.output_raw:
# Read line-by-line
for line in sys.stdin:
line = line.strip()
if not line:
continue
# Write raw audio to stdout as its produced
audio_stream = voice.synthesize_stream_raw(line, **synthesize_args)
for audio_bytes in audio_stream:
sys.stdout.buffer.write(audio_bytes)
sys.stdout.buffer.flush()
elif args.output_dir:
output_dir = Path(args.output_dir)
output_dir.mkdir(parents=True, exist_ok=True)
@@ -53,21 +130,23 @@ def main() -> None:
if not line:
continue
wav_bytes = synthesize(line)
wav_path = output_dir / f"{time.monotonic_ns()}.wav"
wav_path.write_bytes(wav_bytes)
with wave.open(str(wav_path), "wb") as wav_file:
voice.synthesize(line, wav_file, **synthesize_args)
_LOGGER.info("Wrote %s", wav_path)
else:
# Read entire input
text = sys.stdin.read()
wav_bytes = synthesize(text)
if (not args.output_file) or (args.output_file == "-"):
# Write to stdout
sys.stdout.buffer.write(wav_bytes)
with wave.open(sys.stdout.buffer, "wb") as wav_file:
voice.synthesize(text, wav_file, **synthesize_args)
else:
with open(args.output_file, "wb") as output_file:
output_file.write(wav_bytes)
# Write to file
with wave.open(args.output_file, "wb") as wav_file:
voice.synthesize(text, wav_file, **synthesize_args)
if __name__ == "__main__":
+53
View File
@@ -0,0 +1,53 @@
"""Piper configuration"""
from dataclasses import dataclass
from enum import Enum
from typing import Any, Dict, Mapping, Sequence
class PhonemeType(str, Enum):
ESPEAK = "espeak"
TEXT = "text"
@dataclass
class PiperConfig:
"""Piper configuration"""
num_symbols: int
"""Number of phonemes"""
num_speakers: int
"""Number of speakers"""
sample_rate: int
"""Sample rate of output audio"""
espeak_voice: str
"""Name of espeak-ng voice or alphabet"""
length_scale: float
noise_scale: float
noise_w: float
phoneme_id_map: Mapping[str, Sequence[int]]
"""Phoneme -> [id,]"""
phoneme_type: PhonemeType
"""espeak or text"""
@staticmethod
def from_dict(config: Dict[str, Any]) -> "PiperConfig":
inference = config.get("inference", {})
return PiperConfig(
num_symbols=config["num_symbols"],
num_speakers=config["num_speakers"],
sample_rate=config["audio"]["sample_rate"],
noise_scale=inference.get("noise_scale", 0.667),
length_scale=inference.get("length_scale", 1.0),
noise_w=inference.get("noise_w", 0.8),
#
espeak_voice=config["espeak"]["voice"],
phoneme_id_map=config["phoneme_id_map"],
phoneme_type=PhonemeType(config.get("phoneme_type", PhonemeType.ESPEAK)),
)
+5
View File
@@ -0,0 +1,5 @@
"""Constants"""
PAD = "_" # padding (0)
BOS = "^" # beginning of sentence
EOS = "$" # end of sentence
+120
View File
@@ -0,0 +1,120 @@
"""Utility for downloading Piper voices."""
import json
import logging
import shutil
from pathlib import Path
from typing import Any, Dict, Iterable, Set, Tuple, Union
from urllib.request import urlopen
from .file_hash import get_file_hash
URL_FORMAT = "https://huggingface.co/rhasspy/piper-voices/resolve/v1.0.0/{file}"
_DIR = Path(__file__).parent
_LOGGER = logging.getLogger(__name__)
_SKIP_FILES = {"MODEL_CARD"}
class VoiceNotFoundError(Exception):
pass
def get_voices() -> Dict[str, Any]:
"""Loads available voices from embedded JSON file."""
with open(_DIR / "voices.json", "r", encoding="utf-8") as voices_file:
return json.load(voices_file)
def ensure_voice_exists(
name: str,
data_dirs: Iterable[Union[str, Path]],
download_dir: Union[str, Path],
voices_info: Dict[str, Any],
):
assert data_dirs, "No data dirs"
if name not in voices_info:
raise VoiceNotFoundError(name)
voice_info = voices_info[name]
voice_files = voice_info["files"]
files_to_download: Set[str] = set()
for data_dir in data_dirs:
data_dir = Path(data_dir)
# Check sizes/hashes
for file_path, file_info in voice_files.items():
if file_path in files_to_download:
# Already planning to download
continue
file_name = Path(file_path).name
if file_name in _SKIP_FILES:
continue
data_file_path = data_dir / file_name
_LOGGER.debug("Checking %s", data_file_path)
if not data_file_path.exists():
_LOGGER.debug("Missing %s", data_file_path)
files_to_download.add(file_path)
continue
expected_size = file_info["size_bytes"]
actual_size = data_file_path.stat().st_size
if expected_size != actual_size:
_LOGGER.warning(
"Wrong size (expected=%s, actual=%s) for %s",
expected_size,
actual_size,
data_file_path,
)
files_to_download.add(file_path)
continue
expected_hash = file_info["md5_digest"]
actual_hash = get_file_hash(data_file_path)
if expected_hash != actual_hash:
_LOGGER.warning(
"Wrong hash (expected=%s, actual=%s) for %s",
expected_hash,
actual_hash,
data_file_path,
)
files_to_download.add(file_path)
continue
if (not voice_files) and (not files_to_download):
raise ValueError(f"Unable to find or download voice: {name}")
# Download missing files
download_dir = Path(download_dir)
for file_path in files_to_download:
file_name = Path(file_path).name
if file_name in _SKIP_FILES:
continue
file_url = URL_FORMAT.format(file=file_path)
download_file_path = download_dir / file_name
download_file_path.parent.mkdir(parents=True, exist_ok=True)
_LOGGER.debug("Downloading %s to %s", file_url, download_file_path)
with urlopen(file_url) as response, open(
download_file_path, "wb"
) as download_file:
shutil.copyfileobj(response, download_file)
_LOGGER.info("Downloaded %s (%s)", download_file_path, file_url)
def find_voice(name: str, data_dirs: Iterable[Union[str, Path]]) -> Tuple[Path, Path]:
for data_dir in data_dirs:
data_dir = Path(data_dir)
onnx_path = data_dir / f"{name}.onnx"
config_path = data_dir / f"{name}.onnx.json"
if onnx_path.exists() and config_path.exists():
return onnx_path, config_path
raise ValueError(f"Missing files for voice {name}")
+46
View File
@@ -0,0 +1,46 @@
import argparse
import hashlib
import json
import sys
from pathlib import Path
from typing import Union
def get_file_hash(path: Union[str, Path], bytes_per_chunk: int = 8192) -> str:
"""Hash a file in chunks using md5."""
path_hash = hashlib.md5()
with open(path, "rb") as path_file:
chunk = path_file.read(bytes_per_chunk)
while chunk:
path_hash.update(chunk)
chunk = path_file.read(bytes_per_chunk)
return path_hash.hexdigest()
# -----------------------------------------------------------------------------
def main():
parser = argparse.ArgumentParser()
parser.add_argument("file", nargs="+")
parser.add_argument("--dir", help="Parent directory")
args = parser.parse_args()
if args.dir:
args.dir = Path(args.dir)
hashes = {}
for path_str in args.file:
path = Path(path_str)
path_hash = get_file_hash(path)
if args.dir:
path = path.relative_to(args.dir)
hashes[str(path)] = path_hash
json.dump(hashes, sys.stdout)
if __name__ == "__main__":
main()
+12
View File
@@ -0,0 +1,12 @@
"""Utilities"""
import numpy as np
def audio_float_to_int16(
audio: np.ndarray, max_wav_value: float = 32767.0
) -> np.ndarray:
"""Normalize audio and convert to int16 range"""
audio_norm = audio * (max_wav_value / max(0.01, np.max(np.abs(audio))))
audio_norm = np.clip(audio_norm, -max_wav_value, max_wav_value)
audio_norm = audio_norm.astype("int16")
return audio_norm
+177
View File
@@ -0,0 +1,177 @@
import json
import logging
import wave
from dataclasses import dataclass
from pathlib import Path
from typing import Iterable, List, Optional, Union
import numpy as np
import onnxruntime
from piper_phonemize import phonemize_codepoints, phonemize_espeak, tashkeel_run
from .config import PhonemeType, PiperConfig
from .const import BOS, EOS, PAD
from .util import audio_float_to_int16
_LOGGER = logging.getLogger(__name__)
@dataclass
class PiperVoice:
session: onnxruntime.InferenceSession
config: PiperConfig
@staticmethod
def load(
model_path: Union[str, Path],
config_path: Optional[Union[str, Path]] = None,
use_cuda: bool = False,
) -> "PiperVoice":
"""Load an ONNX model and config."""
if config_path is None:
config_path = f"{model_path}.json"
with open(config_path, "r", encoding="utf-8") as config_file:
config_dict = json.load(config_file)
return PiperVoice(
config=PiperConfig.from_dict(config_dict),
session=onnxruntime.InferenceSession(
str(model_path),
sess_options=onnxruntime.SessionOptions(),
providers=["CPUExecutionProvider"]
if not use_cuda
else ["CUDAExecutionProvider"],
),
)
def phonemize(self, text: str) -> List[List[str]]:
"""Text to phonemes grouped by sentence."""
if self.config.phoneme_type == PhonemeType.ESPEAK:
if self.config.espeak_voice == "ar":
# Arabic diacritization
# https://github.com/mush42/libtashkeel/
text = tashkeel_run(text)
return phonemize_espeak(text, self.config.espeak_voice)
if self.config.phoneme_type == PhonemeType.TEXT:
return phonemize_codepoints(text)
raise ValueError(f"Unexpected phoneme type: {self.config.phoneme_type}")
def phonemes_to_ids(self, phonemes: List[str]) -> List[int]:
"""Phonemes to ids."""
id_map = self.config.phoneme_id_map
ids: List[int] = list(id_map[BOS])
for phoneme in phonemes:
if phoneme not in id_map:
_LOGGER.warning("Missing phoneme from id map: %s", phoneme)
continue
ids.extend(id_map[phoneme])
ids.extend(id_map[PAD])
ids.extend(id_map[EOS])
return ids
def synthesize(
self,
text: str,
wav_file: wave.Wave_write,
speaker_id: Optional[int] = None,
length_scale: Optional[float] = None,
noise_scale: Optional[float] = None,
noise_w: Optional[float] = None,
sentence_silence: float = 0.0,
):
"""Synthesize WAV audio from text."""
wav_file.setframerate(self.config.sample_rate)
wav_file.setsampwidth(2) # 16-bit
wav_file.setnchannels(1) # mono
for audio_bytes in self.synthesize_stream_raw(
text,
speaker_id=speaker_id,
length_scale=length_scale,
noise_scale=noise_scale,
noise_w=noise_w,
sentence_silence=sentence_silence,
):
wav_file.writeframes(audio_bytes)
def synthesize_stream_raw(
self,
text: str,
speaker_id: Optional[int] = None,
length_scale: Optional[float] = None,
noise_scale: Optional[float] = None,
noise_w: Optional[float] = None,
sentence_silence: float = 0.0,
) -> Iterable[bytes]:
"""Synthesize raw audio per sentence from text."""
sentence_phonemes = self.phonemize(text)
# 16-bit mono
num_silence_samples = int(sentence_silence * self.config.sample_rate)
silence_bytes = bytes(num_silence_samples * 2)
for phonemes in sentence_phonemes:
phoneme_ids = self.phonemes_to_ids(phonemes)
yield self.synthesize_ids_to_raw(
phoneme_ids,
speaker_id=speaker_id,
length_scale=length_scale,
noise_scale=noise_scale,
noise_w=noise_w,
) + silence_bytes
def synthesize_ids_to_raw(
self,
phoneme_ids: List[int],
speaker_id: Optional[int] = None,
length_scale: Optional[float] = None,
noise_scale: Optional[float] = None,
noise_w: Optional[float] = None,
) -> bytes:
"""Synthesize raw audio from phoneme ids."""
if length_scale is None:
length_scale = self.config.length_scale
if noise_scale is None:
noise_scale = self.config.noise_scale
if noise_w is None:
noise_w = self.config.noise_w
phoneme_ids_array = np.expand_dims(np.array(phoneme_ids, dtype=np.int64), 0)
phoneme_ids_lengths = np.array([phoneme_ids_array.shape[1]], dtype=np.int64)
scales = np.array(
[noise_scale, length_scale, noise_w],
dtype=np.float32,
)
if (self.config.num_speakers > 1) and (speaker_id is None):
# Default speaker
speaker_id = 0
sid = None
if speaker_id is not None:
sid = np.array([speaker_id], dtype=np.int64)
# Synthesize through Onnx
audio = self.session.run(
None,
{
"input": phoneme_ids_array,
"input_lengths": phoneme_ids_lengths,
"scales": scales,
"sid": sid,
},
)[0].squeeze((0, 1))
audio = audio_float_to_int16(audio.squeeze())
return audio.tobytes()
File diff suppressed because it is too large Load Diff
+2 -2
View File
@@ -1,2 +1,2 @@
espeak-phonemizer>=1.1.0,<2
onnxruntime~=1.11.0
piper-phonemize~=1.0.0
onnxruntime>=1.11.0,<2
+47
View File
@@ -0,0 +1,47 @@
#!/usr/bin/env python3
from pathlib import Path
import setuptools
from setuptools import setup
this_dir = Path(__file__).parent
module_dir = this_dir / "piper"
requirements = []
requirements_path = this_dir / "requirements.txt"
if requirements_path.is_file():
with open(requirements_path, "r", encoding="utf-8") as requirements_file:
requirements = requirements_file.read().splitlines()
data_files = [module_dir / "voices.json"]
# -----------------------------------------------------------------------------
setup(
name="piper-tts",
version="1.1.0",
description="A fast, local neural text to speech system that sounds great and is optimized for the Raspberry Pi 4.",
url="http://github.com/rhasspy/piper",
author="Michael Hansen",
author_email="mike@rhasspy.org",
license="MIT",
packages=setuptools.find_packages(),
package_data={"piper": [str(p.relative_to(module_dir)) for p in data_files]},
entry_points={
"console_scripts": [
"piper = piper.__main__:main",
]
},
install_requires=requirements,
classifiers=[
"Development Status :: 3 - Alpha",
"Intended Audience :: Developers",
"Topic :: Text Processing :: Linguistic",
"License :: OSI Approved :: MIT License",
"Programming Language :: Python :: 3.7",
"Programming Language :: Python :: 3.8",
"Programming Language :: Python :: 3.9",
"Programming Language :: Python :: 3.10",
],
keywords="rhasspy piper tts",
)