Rate Limits Policy - Infercom Documentation

Rate limits are a mechanism to help manage Infercom API usage to provide stable performance and reliable service. They limit how many times each user can call the Infercom API within a given interval. Rate limits are measured in:

RPM: Requests per minute
RPD: Requests per day

Basics

A request is defined by a call to our API
You can hit either limit type (RPM or RPD) depending on which one you reach first
You will be notified in every request response what the status of your rate limits are (see rate limit response headers for more information)
If you hit a rate limit, you will be sent an error message in your response (see API error codes)

Infercom Inference Service rate limit tiers

There are a few different rate limit tier offerings we provide:

Free Tier: Applied when there is no payment method linked with your account
Developer Tier: Applied when a payment method is linked with your account
Enterprise Tier: Please contact our sales team for our enterprise tier rate limit plans

Please see the Billing page to link a payment method to your account.

Below are our Developer Tier and Free Tier rate limits.

Model rate limits

Developer Tier
Free Tier

EU-hosted models (sovereign)

Developer	Model ID	Region	Requests per minute (RPM)	Requests per day (RPD)
DeepSeek	`DeepSeek-V3.1`	EU	30	15,000
Meta	`Meta-Llama-3.3-70B-Instruct`	EU	120	30,000
OpenAI	`gpt-oss-120b`	EU	150	50,000

Global models (non-sovereign)

Developer	Model ID	Region	Requests per minute (RPM)	Requests per day (RPD)
Alibaba	`Qwen3-32B`	Global	30	6,000
Alibaba	`Qwen3-235B`	Global	30	6,000
DeepSeek	`DeepSeek-R1-0528`	Global	60	12,000
DeepSeek	`DeepSeek-R1-Distill-Llama-70B`	Global	240	48,000
DeepSeek	`DeepSeek-V3-0324`	Global	60	12,000
DeepSeek	`DeepSeek-V3.1-Terminus`	Global	60	12,000
DeepSeek	`DeepSeek-V3.2`	Global	60	12,000
Meta	`Llama-4-Maverick-17B-128E-Instruct`	Global	60	12,000
Meta	`Meta-Llama-3.1-8B-Instruct`	Global	1,440	288,000

EU-hosted models (sovereign)

Developer	Model ID	Region	Requests per minute (RPM)	Requests per day (RPD)
DeepSeek	`DeepSeek-V3.1`	EU	10	10
Meta	`Meta-Llama-3.3-70B-Instruct`	EU	20	20
OpenAI	`gpt-oss-120b`	EU	20	20

Global models (non-sovereign)

Developer	Model ID	Region	Requests per minute (RPM)	Requests per day (RPD)
Alibaba	`Qwen3-32B`	Global	20	20
Alibaba	`Qwen3-235B`	Global	20	20
DeepSeek	`DeepSeek-R1-0528`	Global	20	20
DeepSeek	`DeepSeek-R1-Distill-Llama-70B`	Global	20	20
DeepSeek	`DeepSeek-V3-0324`	Global	20	20
DeepSeek	`DeepSeek-V3.1-Terminus`	Global	20	20
DeepSeek	`DeepSeek-V3.2`	Global	20	20
Meta	`Llama-4-Maverick-17B-128E-Instruct`	Global	20	20
Meta	`Meta-Llama-3.1-8B-Instruct`	Global	20	20

Need higher limits? Enterprise tier plans with increased RPM and RPD limits are available. Contact us at info@infercom.ai to discuss your requirements.

Rate limit response headers

These headers are found in every request response and give information about the current status of rate limit usage. RPM (Requests per minute):

x-ratelimit-limit-requests
- The maximum number of requests allowed per minute.
x-ratelimit-remaining-requests
- The number of requests remaining in the current minute before hitting the rate limit.
x-ratelimit-reset-requests
- Time in epoch time until the per-minute request quota resets.

RPD (Requests per day):

x-ratelimit-limit-requests-day
- The maximum number of requests allowed per day.
x-ratelimit-remaining-requests-day
- The number of requests remaining in the current day before hitting the rate limit.
x-ratelimit-reset-requests-day
- Time in epoch time until the per-day request quota resets.

Get started

Models

Features

Build

Resources

Infercom Model Rate Limits

Infercom Inference Service rate limit tiers

Model rate limits

Rate limit response headers

Get started

Models

Features

Build

Resources

​Infercom Inference Service rate limit tiers

​Model rate limits

​Rate limit response headers

Infercom Inference Service rate limit tiers

Model rate limits

Rate limit response headers