Confirmed that mini uses ~30x more tokens than base gpt-4o using same image/same prompt: { completionTokens: 46, promptTokens: 14207, totalTokens: 14253 } vs. { completionTokens: 82, promptTokens: 465, totalTokens: 547 }.
Page 7 of their technical report [0] has a better apples to apples comparison. Why they choose to show apples to oranges on their landing page is odd to me.
This isn't apples to apples - they're taking the optimal prompting technique for their own model, then using that technique for both models. They should be comparing it against the optimal prompting technique for GPT-4.
We've been using Braintrust for evals at Zapier and it's been really great -- pumped to try out this proxy (which should be able to replace some custom code we've written internally for the same purpose!).
This was a ton of fun to build! We'll also be releasing a NLA enabled version of our Chrome extension [0] within the next couple of days which will be similar (but way more convenient than) the demo on the landing page above.
We're super bullish on LLMs for pulling "no-code" forward, helping more knowledge workers build automations. Already, folks are using our OpenAI [1] + ChatGPT [2] integrations to build very cool automations with summaries, categorization, copy writing, and more. We think there is a ton more to do here.
If anyone is interested in this problem space, shoot me an email bryan@zapier.com!
The code there implies cl100k_base has a vocab size of 100k (I guess it's in the name lol) which means it is more comprehensive than GPT-2's 50k, so fewer tokens will be necessary.
This isn’t exactly true. The Django ORM can be used with care in the async views found in FastAPI and Django (see sync_to_async and run_in_threadpool helpers).
Plans exist to make the Django Queryset async, so it’ll be exciting when that day comes!
So far they are only going down the the queryset level, it then uses a thread pool for the db connector, next job would be to support async db connections.