Disco Narrator - Data Scraping
To train an AI Text-To-Speech (TTS) model, we’ll need to obtain a Labelled Dataset with two things:
- Clean audio files, containing only the voice we’re cloning
- The dialogue transcript (text) for each audio file
How I made a Text-to-Speech (TTS) model for (some of) a video game’s characters.
| Post | TL;DR |
|---|---|
| Data scraping | Simple retelling of the data collection process |
| Data formatting | Python scripting to LJSpeech, plus data cleaning methods |
| Simplistic Training | Various attempts to create my first model |
| Additional characters | Training extra models, and the limitations of small datasets |
To train an AI Text-To-Speech (TTS) model, we’ll need to obtain a Labelled Dataset with two things:
With the raw data in tow, we can construct a proper TTS Dataset with the use of a few Python scripts.