- RPM: Requests per minute
- RPD: Requests per day
- A request is defined by a call to our API
- You can hit either limit type (RPM or RPD) depending on which one you reach first
- You will be notified in every request response what the status of your rate limits are (see rate limit response headers for more information)
- If you hit a rate limit, you will be sent an error message in your response (see API error codes)
Infercom Inference Service rate limit tiers
There are a few different rate limit tier offerings we provide:- Free Tier: Applied when there is no payment method linked with your account
- Developer Tier: Applied when a payment method is linked with your account
- Enterprise Tier: Please contact our sales team for our enterprise tier rate limit plans
Please see the Billing page to link a payment method to your account.
Model rate limits
- Developer Tier
- Free Tier
EU-hosted models (sovereign)
Global models (non-sovereign)
| Developer | Model ID | Region | Requests per minute (RPM) | Requests per day (RPD) |
|---|---|---|---|---|
| DeepSeek | DeepSeek-V3.1 | EU | 30 | 15,000 |
| Meta | Meta-Llama-3.3-70B-Instruct | EU | 120 | 30,000 |
| OpenAI | gpt-oss-120b | EU | 150 | 50,000 |
| Developer | Model ID | Region | Requests per minute (RPM) | Requests per day (RPD) |
|---|---|---|---|---|
| Alibaba | Qwen3-32B | Global | 30 | 6,000 |
| Alibaba | Qwen3-235B | Global | 30 | 6,000 |
| DeepSeek | DeepSeek-R1-0528 | Global | 60 | 12,000 |
| DeepSeek | DeepSeek-R1-Distill-Llama-70B | Global | 240 | 48,000 |
| DeepSeek | DeepSeek-V3-0324 | Global | 60 | 12,000 |
| DeepSeek | DeepSeek-V3.1-Terminus | Global | 60 | 12,000 |
| DeepSeek | DeepSeek-V3.2 | Global | 60 | 12,000 |
| Meta | Llama-4-Maverick-17B-128E-Instruct | Global | 60 | 12,000 |
| Meta | Meta-Llama-3.1-8B-Instruct | Global | 1,440 | 288,000 |
Need higher limits? Enterprise tier plans with increased RPM and RPD limits are available. Contact us at info@infercom.ai to discuss your requirements.
Rate limit response headers
These headers are found in every request response and give information about the current status of rate limit usage. RPM (Requests per minute):x-ratelimit-limit-requests- The maximum number of requests allowed per minute.
x-ratelimit-remaining-requests- The number of requests remaining in the current minute before hitting the rate limit.
x-ratelimit-reset-requests- Time in epoch time until the per-minute request quota resets.
x-ratelimit-limit-requests-day- The maximum number of requests allowed per day.
x-ratelimit-remaining-requests-day- The number of requests remaining in the current day before hitting the rate limit.
x-ratelimit-reset-requests-day- Time in epoch time until the per-day request quota resets.