Hacker News new | past | comments | ask | show | jobs | submit login
Ask HN: How are people getting around GPT4 rate limits?
13 points by aithrowaway 9 months ago | hide | past | favorite | 13 comments
I’ve been playing around with the OpenAI APIs for an AI project. My GPT4 limit is 10k tokens per minute, which seems to be the current default for new accounts. I’m running into it constantly just in development, and that’s for a pretty low-key use case.

It seems like to use it in production for even a modestly successful product would require like 1000x higher limits.

According to OpenAI’s docs, they are not considering rate limit increases at all for GPT4. Azure OpenAI has 2x higher starting limits, but that’s still not close to enough, and they’re not considering rate limit increases currently either.

I’m sure this will change with time, and I understand OpenAI’s reasons for rolling out access gradually. Still, I see some services out there that appear to be leaning heavily on GPT4, or are at least finding ways of getting comparable quality, that seem like they shouldn’t be possible without far higher limits. I’m curious if anyone can speak in general terms on how this is being accomplished? Do they have higher limits from earlier stages of the rollout? Or special deals with OpenAI/MSFT? Or are they using fine-tuning and other strategies to make 3.5 (or other models) reach near 4-level quality? Or using TOS-violating hacks like rotating requests across many API keys?

Building in a way that allows users to make any required GPT4 calls locally with their own API key seems like a possibility as well depending on the app, but that obviously limits the audience and isn’t great for ux or onboarding.

For my use case, GPT4 is just barely reaching the point of viability—not perfect but good enough to provide significant value, whereas 3.5 turbo is woefully inadequate. While it doesn’t seem like a bad idea necessarily to build it out for now within the limits and then be well-positioned when they finally get increased, I’m mainly just wondering whether everyone’s in the same boat on this or if people are finding legitimate workarounds that don’t require some insider connection.




That is a very high volume, especially for being in development. Here are a few things we have done...

  * Use local machine learning models wherever possible.
  * Summarize and consolidate calls whenever possible (i.e. reduce token sizes using language analytics). 
  * Log all calls/responses so it is possible to reuse them and/or to train your ML models. This can cut down on duplicate calls.
  * Monitor your API call logs to make sure the system isn't making calls it shouldn't.
  * Throttle your calls by introducing delays/bottlenecks in the user interface (by far my least favorite).
  * Charge more for your services to decrease demand.
  * Contact your account rep and see what options they have to offer with a higher price tier.


Speak to your Azure account manager. They will happily increase your limits based on your wallet size.


Interesting. According to https://learn.microsoft.com/en-us/azure/ai-services/openai/q... >

"Quota increase requests can be submitted from the Quotas page of Azure OpenAI Studio. Please note that due to overwhelming demand, we are not currently approving new quota increase requests. Your request will be queued until it can be filled at a later time."

But I guess you're saying that exceptions can be made, which isn't too surprising I suppose. Frustrating though if true since this kind of preferential access would seem to run counter to OpenAI's stated mission and mean they (or MSFT at least) are picking winners rather than allowing the best product to win in the market.


What he’s saying is that if you pay them enough money, they will.


Yeah, I understood. I hope it's not true. Selfishly in part, since I don't have that kind of cash available, but also because one of OpenAI's founding principles is to "avoid undue concentration of power". Giving the latest models (sans very restrictive limits) to large or well-funded companies first would seem to violate this principle.

Being a small startup already puts you at a disadvantage. If established/funded companies are always getting the best model months ahead of you, it's really hard to compete even if you can make a better product.


Why wouldn't that be the case? Running these models cost them money. Somebody is paying for it. Those limits are to keep the free tier from driving Azure out of business. I also want a Unicorn, but I can't have one.

Regardless, OpenAI turned its back on its founding principles a long, long time ago.


I'm not talking about free tier limits. Paid users are being limited.


There are pages about in their docs they increased the limits in hours everytime we talked with a rep. You are building a product on top of theirs it's time that you get in contact with them. Everything else is just negligence and it's not only about the API limits... you want to know the company that your project depends on.


From https://platform.openai.com/docs/guides/rate-limits/overview (emphasis mine):

GPT-4 rate limits

During the rollout of GPT-4, the model will have more aggressive rate limits to keep up with demand. You can view your current rate limits in the rate limits section of the account page. We are unable to accommodate requests for rate limit increases due to capacity constraints. We are prioritizing general access to GPT-4 first and will subsequently raise rate limits automatically as capacity allows.


As I said talk with them.


Ok, I'll try that. Thanks.


just curious Even if the tokens is increased, how are you planning to deal with the cost


If and when the rate limits aren't an issue, I'll probably go the typical route of charging some amount per month (probably 20-50 range), which includes some base number of tokens, then charge overage based on OpenAI costs plus a small markup (maybe 5-10%).

I think the tool could be useful enough that many users won't be too cost conscious, but I'll only find out for sure by launching it.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: