{ "nbformat": 4, "nbformat_minor": 0, "metadata": { "colab": { "provenance": [], "gpuType": "T4", "include_colab_link": true }, "kernelspec": { "name": "python3", "display_name": "Python 3" }, "language_info": { "name": "python" }, "accelerator": "GPU", "gpuClass": "standard" }, "cells": [ { "cell_type": "markdown", "metadata": { "id": "view-in-github", "colab_type": "text" }, "source": [ "\"Open" ] }, { "cell_type": "markdown", "source": [ "# **[Piper](https://github.com/rhasspy/piper) training notebook.**\n", "## ![Piper logo](https://contribute.rhasspy.org/img/logo.png)\n", "\n", "---\n", "\n", "- Notebook made by [rmcpantoja](http://github.com/rmcpantoja)\n", "- Collaborator: [Xx_Nessu_xX](https://fakeyou.com/profile/Xx_Nessu_xX)" ], "metadata": { "id": "eK3nmYDB6C1a" } }, { "cell_type": "markdown", "source": [ "# 🔧 ***First steps.*** 🔧" ], "metadata": { "id": "AICh6p5OJybj" } }, { "cell_type": "code", "source": [ "#@markdown ## **Google Colab Anti-Disconnect.** 🔌\n", "#@markdown ---\n", "#@markdown #### Avoid automatic disconnection. Still, it will disconnect after **6 to 12 hours**.\n", "\n", "import IPython\n", "js_code = '''\n", "function ClickConnect(){\n", "console.log(\"Working\");\n", "document.querySelector(\"colab-toolbar-button#connect\").click()\n", "}\n", "setInterval(ClickConnect,60000)\n", "'''\n", "display(IPython.display.Javascript(js_code))" ], "metadata": { "cellView": "form", "id": "qyxSMuzjfQrz" }, "execution_count": null, "outputs": [] }, { "cell_type": "code", "source": [ "#@markdown ## **Check GPU type.** 👁️\n", "#@markdown ---\n", "#@markdown #### A higher capable GPU can lead to faster training speeds. By default, you will have a **Tesla T4**.\n", "!nvidia-smi" ], "metadata": { "cellView": "form", "id": "ygxzp-xHTC7T" }, "execution_count": null, "outputs": [] }, { "cell_type": "code", "execution_count": null, "metadata": { "cellView": "form", "id": "sUNjId07JfAK" }, "outputs": [], "source": [ "#@markdown # **Mount Google Drive.** 📂\n", "from google.colab import drive\n", "drive.mount('/content/drive', force_remount=True)" ] }, { "cell_type": "code", "source": [ "#@markdown # **Install software.** 📦\n", "#@markdown ####In this cell the synthesizer and its necessary dependencies to execute the training will be installed. (this may take a while)\n", "\n", "#@markdown **Note: Please restart the runtime environment when the cell execution is finished. Then you can continue with the training section.**\n", "\n", "# clone:\n", "!git clone https://github.com/rmcpantoja/piper\n", "%cd piper/src/python\n", "!pip install --upgrade pip\n", "!pip install --upgrade wheel setuptools\n", "!pip install -r requirements.txt\n", "!pip install torchtext==0.12.0\n", "!pip install torchvision==0.12.0\n", "!bash build_monotonic_align.sh\n", "!apt-get install espeak-ng\n", "%cd /content" ], "metadata": { "cellView": "form", "id": "_XwmTVlcUgCh" }, "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "source": [ "# 🤖 ***Training.*** 🤖" ], "metadata": { "id": "A3bMzEE0V5Ma" } }, { "cell_type": "code", "source": [ "#@markdown # **1. Extract dataset.** 📥\n", "#@markdown ####Important: the audios must be in **wav format, (16000 or 22050hz, 16-bits, mono), and, for convenience, numbered. Example:**\n", "\n", "#@markdown * **1.wav**\n", "#@markdown * **2.wav**\n", "#@markdown * **3.wav**\n", "#@markdown * **.....**\n", "\n", "#@markdown ---\n", "\n", "%cd /content\n", "!mkdir /content/dataset\n", "%cd /content/dataset\n", "!mkdir /content/dataset/wavs\n", "#@markdown ### Audio dataset path to unzip\n", "zip_path = \"/content/drive/MyDrive/wavs.zip\" #@param {type:\"string\"}\n", "!unzip \"{zip_path}\" -d /content/dataset/wavs\n", "#@markdown ---" ], "metadata": { "cellView": "form", "id": "SvEGjf0aV8eg" }, "execution_count": null, "outputs": [] }, { "cell_type": "code", "source": [ "#@markdown # **2. Upload the transcript file.** 📝\n", "#@markdown ---\n", "#@markdown ####Important: the transcription means writing what the character says in each of the audios, and it must have the following structure:\n", "\n", "#@markdown * wavs/1.wav|This is what my character says in audio 1.\n", "#@markdown * wavs/2.wav|This, the text that the character says in audio 2.\n", "#@markdown * ...\n", "\n", "#@markdown And so on. In addition, the transcript must be in a .csv format. (UTF-8 without BOM)\n", "\n", "%cd /content/dataset\n", "from google.colab import files\n", "!rm /content/dataset/metadata.csv\n", "listfn, length = files.upload().popitem()\n", "if listfn != \"metadata.csv\":\n", " !mv \"$listfn\" metadata.csv\n", "%cd .." ], "metadata": { "cellView": "form", "id": "E0W0OCvXXvue" }, "execution_count": null, "outputs": [] }, { "cell_type": "code", "source": [ "#@markdown # **3. Preprocess dataset.** 🔄\n", "\n", "import os\n", "#@markdown ### First of all, select the language of your dataset.\n", "language = \"English (U.S.)\" #@param [\"Català\", \"Dansk\", \"Deutsch\", \"Ελληνικά\", \"English (British)\", \"English (U.S.)\", \"Español\", \"Suomi\", \"Français\", \"ქართული\", \"Icelandic\", \"Italiano\", \"қазақша\", \"नेपाली\", \"Nederlands\", \"Norsk\", \"Polski\", \"Português (Brasil)\", \"Русский\", \"Svenska\", \"украї́нська\", \"Tiếng Việt\", \"简体中文\"]\n", "#@markdown ---\n", "# language definition:\n", "languages = {\n", " \"Català\": \"ca\",\n", " \"Dansk\": \"da\",\n", " \"Deutsch\": \"de\",\n", " \"Ελληνικά\": \"grc\",\n", " \"English (British)\": \"en\",\n", " \"English (U.S.)\": \"en-us\",\n", " \"Español\": \"es\",\n", " \"Suomi\": \"fi\",\n", " \"Français\": \"fr\",\n", " \"Icelandic\": \"is\",\n", " \"Italiano\": \"it\",\n", " \"ქართული\": \"ka\",\n", " \"қазақша\": \"kk\",\n", " \"नेपाली\": \"ne\",\n", " \"Nederlands\": \"nl\",\n", " \"Norsk\": \"nb\",\n", " \"Polski\": \"pl\",\n", " \"Português (Brasil)\": \"pt-br\",\n", " \"Русский\": \"ru\",\n", " \"Svenska\": \"sv\",\n", " \"украї́нська\": \"uk\",\n", " \"Tiếng Việt\": \"vi-vn-x-central\",\n", " \"简体中文\": \"yue\"\n", "}\n", "\n", "def _get_language(code):\n", " return languages[code]\n", "\n", "final_language = _get_language(language)\n", "#@markdown ### Choose a name for your model:\n", "model_name = \"Test\" #@param {type:\"string\"}\n", "#@markdown ---\n", "# output:\n", "#@markdown ### Choose the working folder: (recommended to save to Drive)\n", "\n", "#@markdown The working folder will be used in preprocessing, but also in training the model.\n", "output_path = \"/content/drive/MyDrive/colab/piper\" #@param {type:\"string\"}\n", "output_dir = output_path+\"/\"+model_name\n", "if not os.path.exists(output_dir):\n", " os.makedirs(output_dir)\n", "#@markdown ---\n", "#@markdown ### Choose dataset format:\n", "dataset_format = \"ljspeech\" #@param [\"ljspeech\", \"mycroft\"]\n", "#@markdown ---\n", "#@markdown ### Is this a single speaker dataset? Otherwise, uncheck:\n", "single_speaker = True #@param {type:\"boolean\"}\n", "if single_speaker:\n", " force_sp = \" --single-speaker\"\n", "else:\n", " force_sp = \"\"\n", "#@markdown ---\n", "#@markdown ### Select the sample rate of the dataset:\n", "sample_rate = \"16000\" #@param [\"16000\", \"22050\"]\n", "#@markdown ---\n", "%cd /content/piper/src/python\n", "!python -m piper_train.preprocess \\\n", " --language {final_language} \\\n", " --input-dir /content/dataset \\\n", " --output-dir {output_dir} \\\n", " --dataset-format {dataset_format} \\\n", " --sample-rate {sample_rate} \\\n", " {force_sp}" ], "metadata": { "id": "dOyx9Y6JYvRF" }, "execution_count": null, "outputs": [] }, { "cell_type": "code", "source": [ "#@markdown # **4. Settings.** 🧰\n", "import json\n", "import ipywidgets as widgets\n", "from IPython.display import display\n", "\n", "#@markdown ### Fine-tune this dataset?\n", "finetune = True #@param {type:\"boolean\"}\n", "#@markdown ---\n", "if finetune:\n", " ft_command = '--resume_from_checkpoint \"/content/pretrained.ckpt\" '\n", " try:\n", " with open('/content/piper/notebooks/pretrained_models.json') as f:\n", " pretrained_models = json.load(f)\n", " if final_language in pretrained_models:\n", " models = pretrained_models[final_language]\n", " model_options = [(model_name, model_name) for model_name, model_url in models.items()]\n", " model_dropdown = widgets.Dropdown(description = \"Choose pretrained model\", options=model_options)\n", " download_button = widgets.Button(description=\"Download\")\n", " def download_model(btn):\n", " model_name = model_dropdown.value\n", " model_url = pretrained_models[final_language][model_name]\n", " if model_url.startswith(\"1\"):\n", " !gdown \"{model_url}\" -O \"/content/pretrained.ckpt\"\n", " elif model_url.startswith(\"https://drive.google.com/file/d/\"):\n", " !gdown \"{model_url}\" -O \"/content/pretrained.ckpt\" --fuzzy\n", " else:\n", " !wget \"{model_url}\" -O \"/content/pretrained.ckpt\"\n", " download_button.on_click(download_model)\n", " display(model_dropdown, download_button)\n", " else:\n", " raise Exception(f\"There are no pretrained models available for the language {final_language}\")\n", " except FileNotFoundError:\n", " raise Exception(\"The pretrained_models.json file was not found.\")\n", "else:\n", " ft_command = \"\"\n", "#@markdown ### Choose batch size based on this dataset:\n", "batch_size = 16 #@param {type:\"integer\"}\n", "#@markdown ---\n", "#@markdown ### Validation split:\n", "validation_split = 0.01 #@param {type:\"number\"}\n", "#@markdown ---\n", "#@markdown ### Choose the quality for this model:\n", "\n", "#@markdown * x-low - 16Khz audio, 5-7M params\n", "#@markdown * low - 16Khz audio, 15-20M params\n", "#@markdown * medium - 22.05Khz audio, 15-20 params\n", "#@markdown * high - 22.05Khz audio, 28-32M params\n", "quality = \"x-low\" #@param [\"high\", \"x-low\", \"medium\"]\n", "#@markdown ---\n", "#@markdown ### For how many epochs to save training checkpoints?\n", "#@markdown The larger your dataset, you should set this saving interval to a smaller value, as epochs can progress longer time.\n", "checkpoint_epochs = 25 #@param {type:\"integer\"}\n", "#@markdown ---\n", "#@markdown ### Step interval to generate model samples:\n", "log_every_n_steps = 250 #@param {type:\"integer\"}\n", "#@markdown ---\n", "#@markdown ### Training epochs:\n", "max_epochs = 10000 #@param {type:\"integer\"}\n", "#@markdown ---" ], "metadata": { "id": "ickQlOCRjkBL", "cellView": "form" }, "execution_count": null, "outputs": [] }, { "cell_type": "code", "source": [ "#@markdown # **5. Run the TensorBoard extension.** 📈\n", "\n", "#@markdown The TensorBoard is used to visualize the results of the model while it is being trained.\n", "%load_ext tensorboard\n", "%tensorboard --logdir {output_dir}" ], "metadata": { "cellView": "form", "id": "MpKDfhAHjHJ3" }, "execution_count": null, "outputs": [] }, { "cell_type": "code", "source": [ "#@markdown # **6. Train.** 🏋️‍♂️\n", "\n", "#@markdown **Note: Remember to empty the trash of your Drive from time to time to avoid a lot of space consumption when saving the models.**\n", "!python -m piper_train \\\n", " --dataset-dir \"{output_dir}\" \\\n", " --accelerator 'gpu' \\\n", " --devices 1 \\\n", " --batch-size {batch_size} \\\n", " --validation-split {validation_split} \\\n", " --num-test-examples 2 \\\n", " --quality {quality} \\\n", " --checkpoint-epochs {checkpoint_epochs} \\\n", " --check_val_every_n_epoch {checkpoint_epochs} \\\n", " --log_every_n_steps {log_every_n_steps} \\\n", " --max_epochs {max_epochs} \\\n", " {ft_command}\\\n", " --precision 32" ], "metadata": { "id": "X4zbSjXg2J3N", "cellView": "form" }, "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "source": [ "# Have you finished training and want to test the model?\n", "\n", "Export your model using the [model exporter notebook](https://colab.research.google.com/github/rmcpantoja/piper/blob/master/notebooks/piper_model_exporter.ipynb)!" ], "metadata": { "id": "6ISG085SYn85" } } ] }