Disco Narrator - Data Scraping
To train an AI Text-To-Speech (TTS) model, we’ll need to obtain a Labelled Dataset with two things:
- Clean audio files, containing only the voice we’re cloning
- The dialogue transcript (text) for each audio file
How I made a Text-to-Speech (TTS) model for (some of) a video game’s characters.
Post | TL;DR |
---|---|
Data scraping | Simple retelling of the data collection process |
Data formatting | Python scripting to LJSpeech, plus data cleaning methods |
Simplistic Training | Various attempts to create my first model |
Additional characters | Training extra models, and the limitations of small datasets |
To train an AI Text-To-Speech (TTS) model, we’ll need to obtain a Labelled Dataset with two things:
With the raw data in tow, we can construct a proper TTS Dataset with the use of a few Python scripts.