- RPM: Requests per minute
- RPD: Requests per day
- A request is defined by a call to our API
- You can hit either limit type (RPM or RPD) depending on which one you reach first
- You will be notified in every request response what the status of your rate limits are (see rate limit response headers for more information)
- If you hit a rate limit, you will be sent an error message in your response (see API error codes)
Infercom Inference Service rate limit tiers
There are a few different rate limit tier offerings we provide:- Free Tier: Applied when there is no payment method linked with your account
- Developer Tier: Applied when a payment method is linked with your account
- Enterprise Tier: Please contact our sales team for our enterprise tier rate limit plans
Please see the Billing page to link a payment method to your account.
Model rate limits
- Developer Tier
- Free Tier
| Developer | Model ID | Requests per minute (RPM) | Requests per day (RPD) |
|---|---|---|---|
| DeepSeek | DeepSeek-V3.1 | 40 | 15000 |
| Meta | Meta-Llama-3.3-70B-Instruct | 120 | 30000 |
| OpenAI | gpt-oss-120b | 150 | 50000 |
Need higher limits? Enterprise tier plans with increased RPM and RPD limits are available. Contact us at info@infercom.ai to discuss your requirements.
Rate limit response headers
These headers are found in every request response and give information about the current status of rate limit usage. RPM (Requests per minute):x-ratelimit-limit-requests- The maximum number of requests allowed per minute.
x-ratelimit-remaining-requests- The number of requests remaining in the current minute before hitting the rate limit.
x-ratelimit-reset-requests- Time in epoch time until the per-minute request quota resets.
x-ratelimit-limit-requests-day- The maximum number of requests allowed per day.
x-ratelimit-remaining-requests-day- The number of requests remaining in the current day before hitting the rate limit.
x-ratelimit-reset-requests-day- Time in epoch time until the per-day request quota resets.