Skip to main content
Rate limits are a mechanism to help manage Infercom API usage to provide stable performance and reliable service. They limit how many times each user can call the Infercom API within a given interval. Rate limits are measured in:
  • RPM: Requests per minute
  • RPD: Requests per day
Basics

Infercom Inference Service rate limit tiers

There are a few different rate limit tier offerings we provide:
  • Free Tier: Applied when there is no payment method linked with your account
  • Developer Tier: Applied when a payment method is linked with your account
  • Enterprise Tier: Please contact our sales team for our enterprise tier rate limit plans
Please see the Billing page to link a payment method to your account.
Below are our Developer Tier and Free Tier rate limits.

Model rate limits

EU-hosted models (sovereign)
DeveloperModel IDRegionRequests per minute (RPM)Requests per day (RPD)
MiniMaxMiniMax-M2.5EU8020,000
OpenAIgpt-oss-120bEU15050,000
Global Model Catalog (non-sovereign)
DeveloperModel IDRegionRequests per minute (RPM)Requests per day (RPD)
DeepSeekDeepSeek-V3.1US3015,000
DeepSeekDeepSeek-V3.2US6012,000
Googlegemma-3-12b-itJP8020,000
MetaMeta-Llama-3.3-70B-InstructUS12030,000
Need higher limits? Enterprise tier plans with increased RPM and RPD limits are available. Contact us at info@infercom.ai to discuss your requirements.

Rate limit response headers

These headers are found in every request response and give information about the current status of rate limit usage. RPM (Requests per minute):
  • x-ratelimit-limit-requests
    • The maximum number of requests allowed per minute.
  • x-ratelimit-remaining-requests
    • The number of requests remaining in the current minute before hitting the rate limit.
  • x-ratelimit-reset-requests
    • Time in epoch time until the per-minute request quota resets.
RPD (Requests per day):
  • x-ratelimit-limit-requests-day
    • The maximum number of requests allowed per day.
  • x-ratelimit-remaining-requests-day
    • The number of requests remaining in the current day before hitting the rate limit.
  • x-ratelimit-reset-requests-day
    • Time in epoch time until the per-day request quota resets.