Non-determinism in GPT-4 is caused by Sparse MoE

152334H 1701 words 8 minutes published on August 5, 2023 included in tech

It’s well-known at this point that GPT-4/GPT-3.5-turbo is non-deterministic, even at temperature=0.0. This is an odd behavior if you’re used to dense decoder-only models, where temp=0 should imply greedy sampling which should imply full determinism, because the logits for the next token should be a pure function of the input sequence & the model weights.

Dumped Blog Ideas

152334H 1086 words 6 minutes published on July 2, 2023 included in meta

Despite the dead appearance of this blog, I actually think about it surprisingly often! Over the years, I’ve written a number of draft blogs or summaries that I simply ended up dumped at varying stages of completion.

Why can TorToiSe be fine-tuned?

152334H 487 words 3 minutes published on February 16, 2023 included in tech

Five days ago, I published a blog post, describing why TorToiSe could not be fine-tuned.

Today, I have released a fork of DL-Art-School with TorToiSe fine-tuning code. How did that happen?

Why can't TorToiSe be fine-tuned?

152334H 1379 words 7 minutes published on February 11, 2023 included in tech

TorToiSe 🐢 is an open-source Text-To-Speech (TTS) neural network that creates fairly authentic & realistic voices. Checkpoints for local inference have been available since April last year, but its users are seemingly unable to fine-tune the model with additional voice data.

Why is this the case, and how could it be fixed?

The bleak future of Artificial Intelligence in Singapore

152334H 272 words 2 minutes published on February 10, 2023 included in opinion

On 9th Feb, the National University of Singapore implemented broad restrictions against the use of AI tools to generate work: