Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.infercom.ai/llms.txt

Use this file to discover all available pages before exploring further.

The Responses API (POST /v1/responses) is designed for agentic workflows and tool-capable integrations. It structures model output as typed items—messages, function calls, and reasoning—rather than a single text field, enabling sophisticated multi-step agent interactions.
The Responses API complements the Chat Completions API and does not replace it. Use Responses API for agentic workflows and tool calling; use Chat Completions for simpler conversational needs.

Supported models

ModelReasoningFunction callingNotes
MiniMax-M2.5YesYesRecommended for agentic coding
gpt-oss-120bYesYesSet reasoning.effort: "high" for best tool calling
Not all models support the Responses API. Models like DeepSeek-V3.1 and Meta-Llama-3.3-70B-Instruct are only available via Chat Completions.

Key characteristics

  • Structured output items: Responses contain typed items (message, function_call, reasoning) rather than a single text field
  • Stateless: Infercom does not store conversation state—supply full history via input[] on each request
  • Client-executed tools: When a tool is needed, the model returns a function_call item; your application executes the function and returns the result
  • Streaming: Server-Sent Events with typed event hierarchy for real-time output

Simple generation

The simplest usage passes a string input and receives a structured response.
from openai import OpenAI

client = OpenAI(
    base_url="https://api.infercom.ai/v1",
    api_key="your-infercom-api-key"
)

response = client.responses.create(
    model="MiniMax-M2.5",
    input="Explain the difference between supervised and unsupervised learning."
)

# Access the text output
print(response.output_text)

Response structure

The response contains an output array with typed items:
{
  "id": "resp_abc123",
  "object": "response",
  "status": "completed",
  "model": "MiniMax-M2.5",
  "output": [
    {
      "type": "reasoning",
      "id": "rs_xyz",
      "status": "completed",
      "content": [
        {
          "type": "reasoning_text",
          "text": "The user is asking about ML concepts..."
        }
      ]
    },
    {
      "type": "message",
      "id": "msg_xyz",
      "role": "assistant",
      "status": "completed",
      "content": [
        {
          "type": "output_text",
          "text": "Supervised learning uses labeled data..."
        }
      ]
    }
  ],
  "usage": {
    "input_tokens": 45,
    "output_tokens": 120,
    "total_tokens": 165,
    "output_tokens_details": {
      "reasoning_tokens": 35
    }
  }
}

System instructions

Use the instructions parameter to provide system-level guidance:
response = client.responses.create(
    model="MiniMax-M2.5",
    instructions="You are a helpful assistant that speaks like a pirate.",
    input="How are you today?"
)

Multi-turn conversations

Since the API is stateless, include the full conversation history in the input array:
# Turn 1
response_1 = client.responses.create(
    model="MiniMax-M2.5",
    input=[{"role": "user", "content": "My name is Thomas."}]
)

# Turn 2 - include prior messages
response_2 = client.responses.create(
    model="MiniMax-M2.5",
    input=[
        {"role": "user", "content": "My name is Thomas."},
        response_1.output[0],  # Include assistant's response
        {"role": "user", "content": "What is my name?"}
    ]
)

print(response_2.output_text)  # "Your name is Thomas..."

Function calling

The Responses API supports function tools for agentic workflows. Only type: "function" tools are supported.

Step 1: Define tools and make initial request

import json

tools = [{
    "type": "function",
    "name": "get_weather",
    "description": "Get current weather for a city.",
    "parameters": {
        "type": "object",
        "properties": {
            "city": {"type": "string", "description": "City name"}
        },
        "required": ["city"]
    }
}]

response = client.responses.create(
    model="MiniMax-M2.5",
    input=[{"role": "user", "content": "What's the weather in Berlin?"}],
    tools=tools
)

# Check if model wants to call a function
for item in response.output:
    if item.type == "function_call":
        print(f"Function: {item.name}")
        print(f"Arguments: {item.arguments}")

Step 2: Execute function and return result

# Execute the function locally
def get_weather(city: str) -> dict:
    # Your actual weather API call here
    return {"city": city, "temperature": "18°C", "condition": "Cloudy"}

# Find the function call in the response
tool_call = next(item for item in response.output if item.type == "function_call")
args = json.loads(tool_call.arguments)
result = get_weather(args["city"])

# Send result back to the model
follow_up = client.responses.create(
    model="MiniMax-M2.5",
    input=[
        {"role": "user", "content": "What's the weather in Berlin?"},
        tool_call,  # Include the function call
        {
            "type": "function_call_output",
            "call_id": tool_call.call_id,
            "output": json.dumps(result)
        }
    ],
    tools=tools
)

print(follow_up.output_text)  # "The weather in Berlin is 18°C and cloudy."

Tool choice

Control when the model uses tools with tool_choice:
ValueBehavior
"auto"Model decides whether to call a function (default)
"none"Model will not call any functions
"required"Model must call at least one function
{"type": "function", "name": "..."}Force a specific function

Structured output (JSON mode)

Request structured JSON output using the text.format parameter.

JSON object mode

response = client.responses.create(
    model="MiniMax-M2.5",
    input="List 3 European capitals",
    text={"format": {"type": "json_object"}}
)

import json
data = json.loads(response.output_text)

JSON schema mode

For guaranteed structure, provide a JSON schema:
response = client.responses.create(
    model="MiniMax-M2.5",
    input="Extract event details: SambaNova launch May 1, 2026 at 10am in San Francisco.",
    text={
        "format": {
            "type": "json_schema",
            "name": "event_extraction",
            "schema": {
                "type": "object",
                "properties": {
                    "title": {"type": "string"},
                    "date": {"type": "string"},
                    "time": {"type": "string"},
                    "location": {"type": "string"}
                },
                "required": ["title", "date", "time", "location"]
            }
        }
    }
)

import json
event = json.loads(response.output_text)
print(event)
# {"title": "SambaNova launch", "date": "May 1, 2026", "time": "10am", "location": "San Francisco"}

Reasoning

Reasoning-capable models expose their thinking process via reasoning output items. Control reasoning depth with reasoning.effort:
EffortBehavior
"low"Faster, less depth
"medium"Balanced (default)
"high"Deeper reasoning, higher token cost
response = client.responses.create(
    model="MiniMax-M2.5",
    input="What is 15 * 23?",
    reasoning={"effort": "high"}
)

# Access reasoning separately from the answer
for item in response.output:
    if item.type == "reasoning":
        print("Reasoning:", item.content[0].text)
    elif item.type == "message":
        print("Answer:", item.content[0].text)
When using gpt-oss-120b for function calling, set reasoning.effort to "high" for best results.

Streaming

Enable streaming for real-time output with stream: true. The API emits Server-Sent Events:
stream = client.responses.create(
    model="MiniMax-M2.5",
    input="Write a short poem about speed.",
    stream=True
)

for event in stream:
    if event.type == "response.output_text.delta":
        print(event.delta, end="", flush=True)

Streaming event types

EventDescription
response.createdResponse initialized
response.in_progressGeneration started
response.output_item.addedNew output item created
response.content_part.addedNew content part added
response.reasoning_text.deltaIncremental reasoning chunk
response.reasoning_text.doneReasoning complete
response.output_text.deltaIncremental output text
response.output_text.doneOutput text complete
response.function_call_arguments.deltaIncremental function arguments
response.function_call_arguments.doneFunction arguments complete
response.content_part.doneContent part finished
response.output_item.doneOutput item completed
response.completedFinal event with usage stats

Request parameters

ParameterTypeRequiredDescription
modelstringYesModel ID (MiniMax-M2.5, gpt-oss-120b)
inputstring | arrayYesText input or conversation array
instructionsstringNoSystem message prepended to input
streambooleanNoEnable SSE streaming (default: false)
max_output_tokensintegerNoMaximum tokens to generate
temperaturenumberNoRandomness 0-2 (default: 0.7)
top_pnumberNoNucleus sampling 0-1 (default: 1)
top_kintegerNoTop-K sampling 1-100
toolsarrayNoFunction tool definitions (max 128)
tool_choicestring | objectNoTool invocation control
parallel_tool_callsbooleanNoAllow parallel tool calls (default: true)
text.formatobjectNoOutput format: text, json_object, json_schema
reasoning.effortstringNoReasoning depth: low, medium, high

Response fields

FieldTypeDescription
idstringUnique response identifier
objectstringAlways "response"
statusstringcompleted, failed, in_progress, incomplete
modelstringModel ID used
outputarrayOutput items (messages, reasoning, function calls)
usageobjectToken usage statistics
errorobjectError details when status: "failed"

Usage statistics

The usage object includes performance metrics:
{
  "input_tokens": 45,
  "output_tokens": 120,
  "total_tokens": 165,
  "input_tokens_details": {"cached_tokens": 0},
  "output_tokens_details": {"reasoning_tokens": 35},
  "time_to_first_token": 0.084,
  "total_latency": 0.459,
  "output_tokens_per_sec": 261.4
}

Responses API vs Chat Completions

FeatureResponses APIChat Completions
Output structureTyped items (message, reasoning, function_call)Single message with content
Reasoning visibilitySeparate reasoning itemsInline in content
Tool resultsStructured function_call_outputtool role messages
Best forAgentic workflows, coding agentsConversational apps

Limitations

  • Stateless: previous_response_id is not supported—supply full conversation history in input[]
  • Function tools only: Built-in tools (web_search, code_interpreter) are not supported
  • Not implemented: frequency_penalty, presence_penalty, max_tool_calls, strict mode

Agentic coding integrations

The Responses API powers agentic coding tools. See integration guides:
  • OpenCode - Terminal-based coding assistant
  • Cline - VS Code extension
  • Aider - Terminal pair programming

Next steps