A small experiment to see if we are there yet with highly virtualized CPU compute and Small Language Models (SLM). The answer is a resounding maybe, but most likely not. Huge thanks to Justine for her work on Llamafile supported by Mozilla. Hope folks find this R&D useful.
Does it produce bad results? Is it slow to respond? Slow to load?
I've been wanting to play around with llamafile-based edge functions but storing even small models in GitHub (for automated deploys) is a terrible and often impossible experience.