Gemma 2 on AWS Lambda with Llamafile

metaskills · 2024-07-06T20:33:09 1720297989

A small experiment to see if we are there yet with highly virtualized CPU compute and Small Language Models (SLM). The answer is a resounding maybe, but most likely not. Huge thanks to Justine for her work on Llamafile supported by Mozilla. Hope folks find this R&D useful.

noman-land · 2024-07-07T14:40:03 1720363203

Can you expound a bit about why not?

Does it produce bad results? Is it slow to respond? Slow to load?

I've been wanting to play around with llamafile-based edge functions but storing even small models in GitHub (for automated deploys) is a terrible and often impossible experience.

xhkkffbf · 2024-07-07T21:16:52 1720387012

This is great work. Has anyone used it enough to compare the lambda costs with the cost of running a comparable model on, say, OpenAI?