Contents

Fast (5x) Inference with TorToiSe-TTS

ignore this if you've already seen the repo before

Contents

I made a fork of TorToiSe with much faster inference speed. Here are the summarised results:

Example texts used

A (70 characters):

I’m looking for contributors who can do optimizations better than me.

B (188 characters):

Then took the other, as just as fair, And having perhaps the better claim, Because it was grassy and wanted wear; Though as for that the passing there Had worn them really about the same,

Original TorToiSe repo:

speed (B)speed (A)preset
112.81s14.94sultra_fast

New repo, with --preset ultra_fast:

speed (B)speed (A)GPT kv-cachesamplercond-free diffusionautocast to fp16
118.6111.20DDIM
115.5110.67DPM++2M
114.5810.24DPM++2M
55.767.25DDIM
53.596.77DPM++2M
51.986.29DPM++2M
9.864.24DDIM
8.513.77DPM++2M
8.123.82DPM++2M
6.783.35DPM++2M

All results listed were generated with a slightly undervolted RTX 3090 on Ubuntu 22.04, with the following base command:

1
python tortoise/do_tts.py --voice emma --seed 42 --text "$TEXT"