ozenhati's comments

ozenhati · 2025-04-05T22:42:04 1743892924

100%. we've found that llama-3.3-70b-versatile and qwen-qwq-32b perform exceptionally well with reliable function calling. we had recognized the need for this and our engineers partnered with glaive ai to create fine tunes of llama 3.0 specifically for better function calling performance until the llama 3.3 models came along and performed even better.

i'd actually love to hear your experience with llama scout and maverick for function calling. i'm going to dig into it with our resident function calling expert rick lamers this week.

ozenhati · 2025-04-05T22:37:27 1743892647

can you reach out to us via live chat on console.groq.com with your organization id?

ozenhati · 2025-04-05T22:36:30 1743892590

do you happen to be trying this out on free tier right now? because our rate limits are at 6k tokens per minute on free tier for this model, which might be what you're running into.

jasonjmcghee · 2025-04-05T23:07:26 1743894446

When I tried llama4 scout and tried to set the max output tokens above 8192 it told me the max was 8192. Once I set it below, it worked. This was in the console

ozenhati · 2025-04-05T22:35:05 1743892505

amazing! and yes, we'll have maverick available today. the reason we limit ctx window is because demand > capacity. we're pretty busy with building out more capacity so we can get to a state where we give everyone access to larger context windows without melting our currently available lpus, haha.

vessenes · 2025-04-05T23:45:48 1743896748

cool. I would so happily pay you guys for long context API that aider could point at -- the speed is just game changing. I know your arch is different, so I understand it's an engineering lift. But, I bet you'd find some pareto optimal point in the curve where you could charge a lott more for the speed you guys can do if it's long enough for coding.

ozenhati · 2025-04-05T22:28:53 1743892133

hi! i work @ groq and just made an account here to answer any questions for anyone who might be confused. groq has been around since 2016 and although we do offer hardware for enterprises in the form of dedicated instances, our goal is to make the models that we host easily accessible via groqcloud and groq api (openai compatible) so you can instantly get access to fast inference. :)

we have a pretty generous free tier and a dev tier you can upgrade to for higher rate limits. also, we deeply value privacy and don't retain your data. you can read more about that here: https://groq.com/privacy-policy/