I’ve been playing around with the OpenAI APIs for an AI project. My GPT4 limit is 10k tokens per minute, which seems to be the current default for new accounts. I’m running into it constantly just in development, and that’s for a pretty low-key use case.
It seems like to use it in production for even a modestly successful product would require like 1000x higher limits.
According to OpenAI’s docs, they are not considering rate limit increases at all for GPT4. Azure OpenAI has 2x higher starting limits, but that’s still not close to enough, and they’re not considering rate limit increases currently either.
I’m sure this will change with time, and I understand OpenAI’s reasons for rolling out access gradually. Still, I see some services out there that appear to be leaning heavily on GPT4, or are at least finding ways of getting comparable quality, that seem like they shouldn’t be possible without far higher limits. I’m curious if anyone can speak in general terms on how this is being accomplished? Do they have higher limits from earlier stages of the rollout? Or special deals with OpenAI/MSFT? Or are they using fine-tuning and other strategies to make 3.5 (or other models) reach near 4-level quality? Or using TOS-violating hacks like rotating requests across many API keys?
Building in a way that allows users to make any required GPT4 calls locally with their own API key seems like a possibility as well depending on the app, but that obviously limits the audience and isn’t great for ux or onboarding.
For my use case, GPT4 is just barely reaching the point of viability—not perfect but good enough to provide significant value, whereas 3.5 turbo is woefully inadequate. While it doesn’t seem like a bad idea necessarily to build it out for now within the limits and then be well-positioned when they finally get increased, I’m mainly just wondering whether everyone’s in the same boat on this or if people are finding legitimate workarounds that don’t require some insider connection.