Rough thoughts on Mixtral vs Open Source
Here’s a thesis (hypothesis, predicate, etc) to chew on:
The mixture-of-experts paradigm is fundamentally a hinderance to open source development, and mixtral-8x5B+2B will be summarily supplanted by a dense model like llama3/mistral-70b/yi/qwen/… in the near future.