> ## Documentation Index
> Fetch the complete documentation index at: https://docs.infercom.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Responses API - Build Agentic Workflows

> Build agentic AI applications with the Infercom Responses API. Structured outputs, function calling, reasoning, and streaming for coding agents and tool-capable integrations.

The Responses API (`POST /v1/responses`) is designed for agentic workflows and tool-capable integrations. It structures model output as typed items—messages, function calls, and reasoning—rather than a single text field, enabling sophisticated multi-step agent interactions.

<Info>
  The Responses API complements the [Chat Completions API](/en/features/text-generation) and does not replace it. Use Responses API for agentic workflows and tool calling; use Chat Completions for simpler conversational needs.
</Info>

## Supported models

| Model          | Reasoning | Function calling | Notes                                                |
| -------------- | --------- | ---------------- | ---------------------------------------------------- |
| `MiniMax-M2.7` | Yes       | Yes              | Recommended for agentic coding (192k context)        |
| `MiniMax-M2.5` | Yes       | Yes              | 160k context                                         |
| `gpt-oss-120b` | Yes       | Yes              | Set `reasoning.effort: "high"` for best tool calling |

<Note>
  Not all models support the Responses API. Models like `DeepSeek-V3.1` and `Meta-Llama-3.3-70B-Instruct` are only available via Chat Completions.
</Note>

## Key characteristics

* **Structured output items**: Responses contain typed items (`message`, `function_call`, `reasoning`) rather than a single text field
* **Stateless**: Infercom does not store conversation state—supply full history via `input[]` on each request
* **Client-executed tools**: When a tool is needed, the model returns a `function_call` item; your application executes the function and returns the result
* **Streaming**: Server-Sent Events with typed event hierarchy for real-time output

## Simple generation

The simplest usage passes a string input and receives a structured response.

<CodeGroup>
  ```python Python (OpenAI SDK) theme={null}
  from openai import OpenAI

  client = OpenAI(
      base_url="https://api.infercom.ai/v1",
      api_key="your-infercom-api-key"
  )

  response = client.responses.create(
      model="MiniMax-M2.7",
      input="Explain the difference between supervised and unsupervised learning."
  )

  # Access the text output
  print(response.output_text)
  ```

  ```python Python (SambaNova SDK) theme={null}
  from sambanova import SambaNova

  client = SambaNova(
      base_url="https://api.infercom.ai/v1",
      api_key="your-infercom-api-key"
  )

  response = client.responses.create(
      model="MiniMax-M2.7",
      input="Explain the difference between supervised and unsupervised learning."
  )

  # Access the text from output items
  print(response.output[0].content[0].text)
  ```

  ```bash cURL theme={null}
  curl -X POST https://api.infercom.ai/v1/responses \
    -H "Authorization: Bearer $INFERCOM_API_KEY" \
    -H "Content-Type: application/json" \
    -d '{
      "model": "MiniMax-M2.7",
      "input": "Explain the difference between supervised and unsupervised learning."
    }'
  ```
</CodeGroup>

### Response structure

The response contains an `output` array with typed items:

```json theme={null}
{
  "id": "resp_abc123",
  "object": "response",
  "status": "completed",
  "model": "MiniMax-M2.7",
  "output": [
    {
      "type": "reasoning",
      "id": "rs_xyz",
      "status": "completed",
      "content": [
        {
          "type": "reasoning_text",
          "text": "The user is asking about ML concepts..."
        }
      ]
    },
    {
      "type": "message",
      "id": "msg_xyz",
      "role": "assistant",
      "status": "completed",
      "content": [
        {
          "type": "output_text",
          "text": "Supervised learning uses labeled data..."
        }
      ]
    }
  ],
  "usage": {
    "input_tokens": 45,
    "output_tokens": 120,
    "total_tokens": 165,
    "output_tokens_details": {
      "reasoning_tokens": 35
    }
  }
}
```

## System instructions

Use the `instructions` parameter to provide system-level guidance:

<CodeGroup>
  ```python Python theme={null}
  response = client.responses.create(
      model="MiniMax-M2.7",
      instructions="You are a helpful assistant that speaks like a pirate.",
      input="How are you today?"
  )
  ```

  ```bash cURL theme={null}
  curl -X POST https://api.infercom.ai/v1/responses \
    -H "Authorization: Bearer $INFERCOM_API_KEY" \
    -H "Content-Type: application/json" \
    -d '{
      "model": "MiniMax-M2.7",
      "instructions": "You are a helpful assistant that speaks like a pirate.",
      "input": "How are you today?"
    }'
  ```
</CodeGroup>

## Multi-turn conversations

Since the API is stateless, include the full conversation history in the `input` array:

<CodeGroup>
  ```python Python theme={null}
  # Turn 1
  response_1 = client.responses.create(
      model="MiniMax-M2.7",
      input=[{"role": "user", "content": "My name is Thomas."}]
  )

  # Turn 2 - include prior messages
  response_2 = client.responses.create(
      model="MiniMax-M2.7",
      input=[
          {"role": "user", "content": "My name is Thomas."},
          response_1.output[0],  # Include assistant's response
          {"role": "user", "content": "What is my name?"}
      ]
  )

  print(response_2.output_text)  # "Your name is Thomas..."
  ```

  ```bash cURL theme={null}
  curl -X POST https://api.infercom.ai/v1/responses \
    -H "Authorization: Bearer $INFERCOM_API_KEY" \
    -H "Content-Type: application/json" \
    -d '{
      "model": "MiniMax-M2.7",
      "input": [
        {"role": "user", "content": "My name is Thomas."},
        {"role": "assistant", "content": "Hello Thomas!"},
        {"role": "user", "content": "What is my name?"}
      ]
    }'
  ```
</CodeGroup>

## Function calling

The Responses API supports function tools for agentic workflows. Only `type: "function"` tools are supported.

### Step 1: Define tools and make initial request

<CodeGroup>
  ```python Python theme={null}
  import json

  tools = [{
      "type": "function",
      "name": "get_weather",
      "description": "Get current weather for a city.",
      "parameters": {
          "type": "object",
          "properties": {
              "city": {"type": "string", "description": "City name"}
          },
          "required": ["city"]
      }
  }]

  response = client.responses.create(
      model="MiniMax-M2.7",
      input=[{"role": "user", "content": "What's the weather in Berlin?"}],
      tools=tools
  )

  # Check if model wants to call a function
  for item in response.output:
      if item.type == "function_call":
          print(f"Function: {item.name}")
          print(f"Arguments: {item.arguments}")
  ```

  ```bash cURL theme={null}
  curl -X POST https://api.infercom.ai/v1/responses \
    -H "Authorization: Bearer $INFERCOM_API_KEY" \
    -H "Content-Type: application/json" \
    -d '{
      "model": "MiniMax-M2.7",
      "input": [{"role": "user", "content": "What is the weather in Berlin?"}],
      "tools": [{
        "type": "function",
        "name": "get_weather",
        "description": "Get current weather for a city",
        "parameters": {
          "type": "object",
          "properties": {
            "city": {"type": "string", "description": "City name"}
          },
          "required": ["city"]
        }
      }]
    }'
  ```
</CodeGroup>

### Step 2: Execute function and return result

<CodeGroup>
  ```python Python theme={null}
  # Execute the function locally
  def get_weather(city: str) -> dict:
      # Your actual weather API call here
      return {"city": city, "temperature": "18°C", "condition": "Cloudy"}

  # Find the function call in the response
  tool_call = next(item for item in response.output if item.type == "function_call")
  args = json.loads(tool_call.arguments)
  result = get_weather(args["city"])

  # Send result back to the model
  follow_up = client.responses.create(
      model="MiniMax-M2.7",
      input=[
          {"role": "user", "content": "What's the weather in Berlin?"},
          tool_call,  # Include the function call
          {
              "type": "function_call_output",
              "call_id": tool_call.call_id,
              "output": json.dumps(result)
          }
      ],
      tools=tools
  )

  print(follow_up.output_text)  # "The weather in Berlin is 18°C and cloudy."
  ```

  ```bash cURL theme={null}
  curl -X POST https://api.infercom.ai/v1/responses \
    -H "Authorization: Bearer $INFERCOM_API_KEY" \
    -H "Content-Type: application/json" \
    -d '{
      "model": "MiniMax-M2.7",
      "input": [
        {"role": "user", "content": "What is the weather in Berlin?"},
        {
          "type": "function_call",
          "id": "fc_123",
          "call_id": "call_123",
          "name": "get_weather",
          "arguments": "{\"city\": \"Berlin\"}",
          "status": "completed"
        },
        {
          "type": "function_call_output",
          "call_id": "call_123",
          "output": "{\"temperature\": \"18°C\", \"condition\": \"Cloudy\"}"
        }
      ],
      "tools": [{
        "type": "function",
        "name": "get_weather",
        "description": "Get current weather for a city",
        "parameters": {
          "type": "object",
          "properties": {"city": {"type": "string"}},
          "required": ["city"]
        }
      }]
    }'
  ```
</CodeGroup>

### Tool choice

Control when the model uses tools with `tool_choice`:

| Value                                 | Behavior                                           |
| ------------------------------------- | -------------------------------------------------- |
| `"auto"`                              | Model decides whether to call a function (default) |
| `"none"`                              | Model will not call any functions                  |
| `"required"`                          | Model must call at least one function              |
| `{"type": "function", "name": "..."}` | Force a specific function                          |

## Structured output (JSON mode)

Request structured JSON output using the `text.format` parameter.

### JSON object mode

<CodeGroup>
  ```python Python theme={null}
  response = client.responses.create(
      model="MiniMax-M2.7",
      input="List 3 European capitals",
      text={"format": {"type": "json_object"}}
  )

  import json
  data = json.loads(response.output_text)
  ```

  ```bash cURL theme={null}
  curl -X POST https://api.infercom.ai/v1/responses \
    -H "Authorization: Bearer $INFERCOM_API_KEY" \
    -H "Content-Type: application/json" \
    -d '{
      "model": "MiniMax-M2.7",
      "input": "List 3 European capitals as JSON",
      "text": {"format": {"type": "json_object"}}
    }'
  ```
</CodeGroup>

### JSON schema mode

For guaranteed structure, provide a JSON schema:

<CodeGroup>
  ```python Python theme={null}
  response = client.responses.create(
      model="MiniMax-M2.7",
      input="Extract event details: SambaNova launch May 1, 2026 at 10am in San Francisco.",
      text={
          "format": {
              "type": "json_schema",
              "name": "event_extraction",
              "schema": {
                  "type": "object",
                  "properties": {
                      "title": {"type": "string"},
                      "date": {"type": "string"},
                      "time": {"type": "string"},
                      "location": {"type": "string"}
                  },
                  "required": ["title", "date", "time", "location"]
              }
          }
      }
  )

  import json
  event = json.loads(response.output_text)
  print(event)
  # {"title": "SambaNova launch", "date": "May 1, 2026", "time": "10am", "location": "San Francisco"}
  ```

  ```bash cURL theme={null}
  curl -X POST https://api.infercom.ai/v1/responses \
    -H "Authorization: Bearer $INFERCOM_API_KEY" \
    -H "Content-Type: application/json" \
    -d '{
      "model": "MiniMax-M2.7",
      "input": "List 3 colors",
      "text": {
        "format": {
          "type": "json_schema",
          "name": "colors",
          "schema": {
            "type": "object",
            "properties": {
              "colors": {"type": "array", "items": {"type": "string"}}
            },
            "required": ["colors"]
          }
        }
      }
    }'
  ```
</CodeGroup>

## Reasoning

Reasoning-capable models expose their thinking process via `reasoning` output items. Control reasoning depth with `reasoning.effort`:

| Effort     | Behavior                            |
| ---------- | ----------------------------------- |
| `"low"`    | Faster, less depth                  |
| `"medium"` | Balanced (default)                  |
| `"high"`   | Deeper reasoning, higher token cost |

<CodeGroup>
  ```python Python theme={null}
  response = client.responses.create(
      model="MiniMax-M2.7",
      input="What is 15 * 23?",
      reasoning={"effort": "high"}
  )

  # Access reasoning separately from the answer
  for item in response.output:
      if item.type == "reasoning":
          print("Reasoning:", item.content[0].text)
      elif item.type == "message":
          print("Answer:", item.content[0].text)
  ```

  ```bash cURL theme={null}
  curl -X POST https://api.infercom.ai/v1/responses \
    -H "Authorization: Bearer $INFERCOM_API_KEY" \
    -H "Content-Type: application/json" \
    -d '{
      "model": "MiniMax-M2.7",
      "input": "What is 15 * 23?",
      "reasoning": {"effort": "high"}
    }'
  ```
</CodeGroup>

<Tip>
  When using `gpt-oss-120b` for function calling, set `reasoning.effort` to `"high"` for best results.
</Tip>

## Streaming

Enable streaming for real-time output with `stream: true`. The API emits Server-Sent Events:

<CodeGroup>
  ```python Python theme={null}
  stream = client.responses.create(
      model="MiniMax-M2.7",
      input="Write a short poem about speed.",
      stream=True
  )

  for event in stream:
      if event.type == "response.output_text.delta":
          print(event.delta, end="", flush=True)
  ```

  ```bash cURL theme={null}
  curl -X POST https://api.infercom.ai/v1/responses \
    -H "Authorization: Bearer $INFERCOM_API_KEY" \
    -H "Content-Type: application/json" \
    -d '{
      "model": "MiniMax-M2.7",
      "input": "Count from 1 to 5",
      "stream": true
    }'
  ```
</CodeGroup>

### Streaming event types

| Event                                    | Description                    |
| ---------------------------------------- | ------------------------------ |
| `response.created`                       | Response initialized           |
| `response.in_progress`                   | Generation started             |
| `response.output_item.added`             | New output item created        |
| `response.content_part.added`            | New content part added         |
| `response.reasoning_text.delta`          | Incremental reasoning chunk    |
| `response.reasoning_text.done`           | Reasoning complete             |
| `response.output_text.delta`             | Incremental output text        |
| `response.output_text.done`              | Output text complete           |
| `response.function_call_arguments.delta` | Incremental function arguments |
| `response.function_call_arguments.done`  | Function arguments complete    |
| `response.content_part.done`             | Content part finished          |
| `response.output_item.done`              | Output item completed          |
| `response.completed`                     | Final event with usage stats   |

## Request parameters

| Parameter             | Type             | Required | Description                                               |
| --------------------- | ---------------- | -------- | --------------------------------------------------------- |
| `model`               | string           | Yes      | Model ID (`MiniMax-M2.7`, `MiniMax-M2.5`, `gpt-oss-120b`) |
| `input`               | string \| array  | Yes      | Text input or conversation array                          |
| `instructions`        | string           | No       | System message prepended to input                         |
| `stream`              | boolean          | No       | Enable SSE streaming (default: false)                     |
| `max_output_tokens`   | integer          | No       | Maximum tokens to generate                                |
| `temperature`         | number           | No       | Randomness 0-2 (default: 0.7)                             |
| `top_p`               | number           | No       | Nucleus sampling 0-1 (default: 1)                         |
| `top_k`               | integer          | No       | Top-K sampling 1-100                                      |
| `tools`               | array            | No       | Function tool definitions (max 128)                       |
| `tool_choice`         | string \| object | No       | Tool invocation control                                   |
| `parallel_tool_calls` | boolean          | No       | Allow parallel tool calls (default: true)                 |
| `text.format`         | object           | No       | Output format: `text`, `json_object`, `json_schema`       |
| `reasoning.effort`    | string           | No       | Reasoning depth: `low`, `medium`, `high`                  |

## Response fields

| Field    | Type   | Description                                        |
| -------- | ------ | -------------------------------------------------- |
| `id`     | string | Unique response identifier                         |
| `object` | string | Always `"response"`                                |
| `status` | string | `completed`, `failed`, `in_progress`, `incomplete` |
| `model`  | string | Model ID used                                      |
| `output` | array  | Output items (messages, reasoning, function calls) |
| `usage`  | object | Token usage statistics                             |
| `error`  | object | Error details when `status: "failed"`              |

### Usage statistics

The `usage` object includes performance metrics:

```json theme={null}
{
  "input_tokens": 45,
  "output_tokens": 120,
  "total_tokens": 165,
  "input_tokens_details": {"cached_tokens": 0},
  "output_tokens_details": {"reasoning_tokens": 35},
  "time_to_first_token": 0.084,
  "total_latency": 0.459,
  "output_tokens_per_sec": 261.4
}
```

## Responses API vs Chat Completions

| Feature              | Responses API                                    | Chat Completions            |
| -------------------- | ------------------------------------------------ | --------------------------- |
| Output structure     | Typed items (message, reasoning, function\_call) | Single message with content |
| Reasoning visibility | Separate `reasoning` items                       | Inline in content           |
| Tool results         | Structured `function_call_output`                | `tool` role messages        |
| Best for             | Agentic workflows, coding agents                 | Conversational apps         |

## Limitations

* **Stateless**: `previous_response_id` is not supported—supply full conversation history in `input[]`
* **Function tools only**: Built-in tools (web\_search, code\_interpreter) are not supported
* **Not implemented**: `frequency_penalty`, `presence_penalty`, `max_tool_calls`, `strict` mode

## Agentic coding integrations

The Responses API powers agentic coding tools. See integration guides:

* [OpenCode](/en/agentic-coding/opencode) - Terminal-based coding assistant
* [Cline](/en/agentic-coding/cline) - VS Code extension
* [Aider](/en/agentic-coding/aider) - Terminal pair programming

## Next steps

* [Function Calling](/en/features/function-calling) - Detailed function calling guide for Chat Completions
* [Text Generation](/en/features/text-generation) - Chat Completions API guide
* [API Reference](/en/api-reference/overview) - Full endpoint documentation