Responses API Guide - Infercom Documentation

The Responses API (POST /v1/responses) is designed for agentic workflows and tool-capable integrations. It structures model output as typed items—messages, function calls, and reasoning—rather than a single text field, enabling sophisticated multi-step agent interactions.

The Responses API complements the Chat Completions API and does not replace it. Use Responses API for agentic workflows and tool calling; use Chat Completions for simpler conversational needs.

Supported models

Model	Reasoning	Function calling	Notes
`MiniMax-M2.5`	Yes	Yes	Recommended for agentic coding
`gpt-oss-120b`	Yes	Yes	Set `reasoning.effort: "high"` for best tool calling

Not all models support the Responses API. Models like DeepSeek-V3.1 and Meta-Llama-3.3-70B-Instruct are only available via Chat Completions.

Key characteristics

Structured output items: Responses contain typed items (message, function_call, reasoning) rather than a single text field
Stateless: Infercom does not store conversation state—supply full history via input[] on each request
Client-executed tools: When a tool is needed, the model returns a function_call item; your application executes the function and returns the result
Streaming: Server-Sent Events with typed event hierarchy for real-time output

Simple generation

The simplest usage passes a string input and receives a structured response.

from openai import OpenAI

client = OpenAI(
    base_url="https://api.infercom.ai/v1",
    api_key="your-infercom-api-key"
)

response = client.responses.create(
    model="MiniMax-M2.5",
    input="Explain the difference between supervised and unsupervised learning."
)

# Access the text output
print(response.output_text)

Response structure

The response contains an output array with typed items:

{
  "id": "resp_abc123",
  "object": "response",
  "status": "completed",
  "model": "MiniMax-M2.5",
  "output": [
    {
      "type": "reasoning",
      "id": "rs_xyz",
      "status": "completed",
      "content": [
        {
          "type": "reasoning_text",
          "text": "The user is asking about ML concepts..."
        }
      ]
    },
    {
      "type": "message",
      "id": "msg_xyz",
      "role": "assistant",
      "status": "completed",
      "content": [
        {
          "type": "output_text",
          "text": "Supervised learning uses labeled data..."
        }
      ]
    }
  ],
  "usage": {
    "input_tokens": 45,
    "output_tokens": 120,
    "total_tokens": 165,
    "output_tokens_details": {
      "reasoning_tokens": 35
    }
  }
}

System instructions

Use the instructions parameter to provide system-level guidance:

response = client.responses.create(
    model="MiniMax-M2.5",
    instructions="You are a helpful assistant that speaks like a pirate.",
    input="How are you today?"
)

Multi-turn conversations

Since the API is stateless, include the full conversation history in the input array:

# Turn 1
response_1 = client.responses.create(
    model="MiniMax-M2.5",
    input=[{"role": "user", "content": "My name is Thomas."}]
)

# Turn 2 - include prior messages
response_2 = client.responses.create(
    model="MiniMax-M2.5",
    input=[
        {"role": "user", "content": "My name is Thomas."},
        response_1.output[0],  # Include assistant's response
        {"role": "user", "content": "What is my name?"}
    ]
)

print(response_2.output_text)  # "Your name is Thomas..."

Function calling

The Responses API supports function tools for agentic workflows. Only type: "function" tools are supported.

Step 1: Define tools and make initial request

import json

tools = [{
    "type": "function",
    "name": "get_weather",
    "description": "Get current weather for a city.",
    "parameters": {
        "type": "object",
        "properties": {
            "city": {"type": "string", "description": "City name"}
        },
        "required": ["city"]
    }
}]

response = client.responses.create(
    model="MiniMax-M2.5",
    input=[{"role": "user", "content": "What's the weather in Berlin?"}],
    tools=tools
)

# Check if model wants to call a function
for item in response.output:
    if item.type == "function_call":
        print(f"Function: {item.name}")
        print(f"Arguments: {item.arguments}")

Step 2: Execute function and return result

# Execute the function locally
def get_weather(city: str) -> dict:
    # Your actual weather API call here
    return {"city": city, "temperature": "18°C", "condition": "Cloudy"}

# Find the function call in the response
tool_call = next(item for item in response.output if item.type == "function_call")
args = json.loads(tool_call.arguments)
result = get_weather(args["city"])

# Send result back to the model
follow_up = client.responses.create(
    model="MiniMax-M2.5",
    input=[
        {"role": "user", "content": "What's the weather in Berlin?"},
        tool_call,  # Include the function call
        {
            "type": "function_call_output",
            "call_id": tool_call.call_id,
            "output": json.dumps(result)
        }
    ],
    tools=tools
)

print(follow_up.output_text)  # "The weather in Berlin is 18°C and cloudy."

Tool choice

Control when the model uses tools with tool_choice:

Value	Behavior
`"auto"`	Model decides whether to call a function (default)
`"none"`	Model will not call any functions
`"required"`	Model must call at least one function
`{"type": "function", "name": "..."}`	Force a specific function

Structured output (JSON mode)

Request structured JSON output using the text.format parameter.

JSON object mode

response = client.responses.create(
    model="MiniMax-M2.5",
    input="List 3 European capitals",
    text={"format": {"type": "json_object"}}
)

import json
data = json.loads(response.output_text)

JSON schema mode

For guaranteed structure, provide a JSON schema:

response = client.responses.create(
    model="MiniMax-M2.5",
    input="Extract event details: SambaNova launch May 1, 2026 at 10am in San Francisco.",
    text={
        "format": {
            "type": "json_schema",
            "name": "event_extraction",
            "schema": {
                "type": "object",
                "properties": {
                    "title": {"type": "string"},
                    "date": {"type": "string"},
                    "time": {"type": "string"},
                    "location": {"type": "string"}
                },
                "required": ["title", "date", "time", "location"]
            }
        }
    }
)

import json
event = json.loads(response.output_text)
print(event)
# {"title": "SambaNova launch", "date": "May 1, 2026", "time": "10am", "location": "San Francisco"}

Reasoning

Reasoning-capable models expose their thinking process via reasoning output items. Control reasoning depth with reasoning.effort:

Effort	Behavior
`"low"`	Faster, less depth
`"medium"`	Balanced (default)
`"high"`	Deeper reasoning, higher token cost

response = client.responses.create(
    model="MiniMax-M2.5",
    input="What is 15 * 23?",
    reasoning={"effort": "high"}
)

# Access reasoning separately from the answer
for item in response.output:
    if item.type == "reasoning":
        print("Reasoning:", item.content[0].text)
    elif item.type == "message":
        print("Answer:", item.content[0].text)

When using gpt-oss-120b for function calling, set reasoning.effort to "high" for best results.

Streaming

Enable streaming for real-time output with stream: true. The API emits Server-Sent Events:

stream = client.responses.create(
    model="MiniMax-M2.5",
    input="Write a short poem about speed.",
    stream=True
)

for event in stream:
    if event.type == "response.output_text.delta":
        print(event.delta, end="", flush=True)

Streaming event types

Event	Description
`response.created`	Response initialized
`response.in_progress`	Generation started
`response.output_item.added`	New output item created
`response.content_part.added`	New content part added
`response.reasoning_text.delta`	Incremental reasoning chunk
`response.reasoning_text.done`	Reasoning complete
`response.output_text.delta`	Incremental output text
`response.output_text.done`	Output text complete
`response.function_call_arguments.delta`	Incremental function arguments
`response.function_call_arguments.done`	Function arguments complete
`response.content_part.done`	Content part finished
`response.output_item.done`	Output item completed
`response.completed`	Final event with usage stats

Request parameters

Parameter	Type	Required	Description
`model`	string	Yes	Model ID (`MiniMax-M2.5`, `gpt-oss-120b`)
`input`	string \| array	Yes	Text input or conversation array
`instructions`	string	No	System message prepended to input
`stream`	boolean	No	Enable SSE streaming (default: false)
`max_output_tokens`	integer	No	Maximum tokens to generate
`temperature`	number	No	Randomness 0-2 (default: 0.7)
`top_p`	number	No	Nucleus sampling 0-1 (default: 1)
`top_k`	integer	No	Top-K sampling 1-100
`tools`	array	No	Function tool definitions (max 128)
`tool_choice`	string \| object	No	Tool invocation control
`parallel_tool_calls`	boolean	No	Allow parallel tool calls (default: true)
`text.format`	object	No	Output format: `text`, `json_object`, `json_schema`
`reasoning.effort`	string	No	Reasoning depth: `low`, `medium`, `high`

Response fields

Field	Type	Description
`id`	string	Unique response identifier
`object`	string	Always `"response"`
`status`	string	`completed`, `failed`, `in_progress`, `incomplete`
`model`	string	Model ID used
`output`	array	Output items (messages, reasoning, function calls)
`usage`	object	Token usage statistics
`error`	object	Error details when `status: "failed"`

Usage statistics

The usage object includes performance metrics:

{
  "input_tokens": 45,
  "output_tokens": 120,
  "total_tokens": 165,
  "input_tokens_details": {"cached_tokens": 0},
  "output_tokens_details": {"reasoning_tokens": 35},
  "time_to_first_token": 0.084,
  "total_latency": 0.459,
  "output_tokens_per_sec": 261.4
}

Responses API vs Chat Completions

Feature	Responses API	Chat Completions
Output structure	Typed items (message, reasoning, function_call)	Single message with content
Reasoning visibility	Separate `reasoning` items	Inline in content
Tool results	Structured `function_call_output`	`tool` role messages
Best for	Agentic workflows, coding agents	Conversational apps

Limitations

Stateless: previous_response_id is not supported—supply full conversation history in input[]
Function tools only: Built-in tools (web_search, code_interpreter) are not supported
Not implemented: frequency_penalty, presence_penalty, max_tool_calls, strict mode

Agentic coding integrations

The Responses API powers agentic coding tools. See integration guides:

OpenCode - Terminal-based coding assistant
Cline - VS Code extension
Aider - Terminal pair programming

Next steps

Function Calling - Detailed function calling guide for Chat Completions
Text Generation - Chat Completions API guide
API Reference - Full endpoint documentation

Get started

Models

Features

Agentic Coding

Build

Resources

Responses API - Build Agentic Workflows

Supported models

Key characteristics

Simple generation

Response structure

System instructions

Multi-turn conversations

Function calling

Step 1: Define tools and make initial request

Step 2: Execute function and return result

Tool choice

Structured output (JSON mode)

JSON object mode

JSON schema mode

Reasoning

Streaming

Streaming event types

Request parameters

Response fields

Usage statistics

Responses API vs Chat Completions

Limitations

Agentic coding integrations

Next steps

Get started

Models

Features

Agentic Coding

Build

Resources

Documentation Index

​Supported models

​Key characteristics

​Simple generation

​Response structure

​System instructions

​Multi-turn conversations

​Function calling

​Step 1: Define tools and make initial request

​Step 2: Execute function and return result

​Tool choice

​Structured output (JSON mode)

​JSON object mode

​JSON schema mode

​Reasoning

​Streaming

​Streaming event types

​Request parameters

​Response fields

​Usage statistics

​Responses API vs Chat Completions

​Limitations

​Agentic coding integrations

​Next steps

Supported models

Key characteristics

Simple generation

Response structure

System instructions

Multi-turn conversations

Function calling

Step 1: Define tools and make initial request

Step 2: Execute function and return result

Tool choice

Structured output (JSON mode)

JSON object mode

JSON schema mode

Reasoning

Streaming

Streaming event types

Request parameters

Response fields

Usage statistics

Responses API vs Chat Completions

Limitations

Agentic coding integrations

Next steps