Skip to main content
Rate limits are a mechanism to help manage Infercom API usage to provide stable performance and reliable service. They limit how many times each user can call the Infercom API within a given interval. Rate limits are measured in:
  • RPM: Requests per minute
  • RPD: Requests per day
Basics
  • A request is defined by a call to our API
  • You can hit either limit type (RPM or RPD) depending on which one you reach first
  • You will be notified in every request response what the status of your rate limits are (see rate limit response headers for more information)
  • If you hit a rate limit, you will be sent an error message in your response (see API error codes)

Infercom Inference Service rate limit tiers

There are a few different rate limit tier offerings we provide:
  • Free Tier: Applied when there is no payment method linked with your account
  • Developer Tier: Applied when a payment method is linked with your account
  • Enterprise Tier: Please contact our sales team for our enterprise tier rate limit plans
Please see the Billing page to link a payment method to your account.
Below are our Developer Tier and Free Tier rate limits.

Model rate limits

DeveloperModel IDRequests per minute (RPM)Requests per day (RPD)
DeepSeekDeepSeek-V3.14015000
MetaMeta-Llama-3.3-70B-Instruct12030000
OpenAIgpt-oss-120b15050000
Need higher limits? Enterprise tier plans with increased RPM and RPD limits are available. Contact us at info@infercom.ai to discuss your requirements.

Rate limit response headers

These headers are found in every request response and give information about the current status of rate limit usage. RPM (Requests per minute):
  • x-ratelimit-limit-requests
    • The maximum number of requests allowed per minute.
  • x-ratelimit-remaining-requests
    • The number of requests remaining in the current minute before hitting the rate limit.
  • x-ratelimit-reset-requests
    • Time in epoch time until the per-minute request quota resets.
RPD (Requests per day):
  • x-ratelimit-limit-requests-day
    • The maximum number of requests allowed per day.
  • x-ratelimit-remaining-requests-day
    • The number of requests remaining in the current day before hitting the rate limit.
  • x-ratelimit-reset-requests-day
    • Time in epoch time until the per-day request quota resets.