Vision and Multimodal Guide - Infercom Documentation

Infercom provides access to vision-capable models through the Global Model Catalog, allowing users to process both text and images. These models analyze images and generate context-aware text responses. Learn how to query Infercom vision models using either the SambaNova SDK or OpenAI Python client.

Vision models are available via the Global Model Catalog only. Currently, gemma-3-12b-it (hosted in Japan) is the only vision-capable model available. Requests are processed outside the EU. For EU-sovereign workloads requiring image processing, please contact us to discuss options.

Make a query with an image

On Infercom, the vision model request follows OpenAI’s multimodal input format which accepts both text and image inputs in a structured payload. While the call is similar to Text Generation, it differs by including an encoded image file, referenced via the image_path variable. A helper function is used to convert this image into a base64 string, allowing it to be passed alongside the text in the request.

Step 1

Make a new Python file and copy the code below.

This example uses the gemma-3-12b-it model, Google’s vision-capable Gemma 3 model available via the Global Model Catalog (Japan).

from sambanova import SambaNova
import base64

client = SambaNova(
    base_url="https://api.infercom.ai/v1",
    api_key="your-infercom-api-key",
)

# Helper function to encode the image
def encode_image(image_path):
  with open(image_path, "rb") as image_file:
    return base64.b64encode(image_file.read()).decode('utf-8')

# The path to your image
image_path = "sample.JPEG"

# The base64 string of the image
image_base64 = encode_image(image_path)

print(image_base64)

response = client.chat.completions.create(
    model="gemma-3-12b-it",
    messages=[
        {
            "role": "user",
            "content": [
                {"type": "text", "text": "What is happening in this image?"},
                {"type": "image_url", "image_url": {"url": f"data:image/jpeg;base64,{image_base64}"}}
            ]
        }
    ]
)

print(response.choices[0].message.content)

Step 2

Use your Infercom API key from the API keys and URLs page to replace the placeholder "your-infercom-api-key" in the construction of the client.

Step 3

Select an image and move it to a suitable path that you can specify in the lines.

# The path to your image
image_path = "sample.JPEG"

Step 4

Verify the prompt to pair with the image in the content portion of the user prompt.

Step 5

Run the Python file to receive the text output.

Get started

Models

Features

Agentic Coding

Build

Resources

Implement Vision and Multimodal Features - Developer Guide

Make a query with an image

Get started

Models

Features

Agentic Coding

Build

Resources

​Make a query with an image

Make a query with an image