Make a query with an image
On Infercom, the vision model request follows OpenAI’s multimodal input format which accepts both text and image inputs in a structured payload. While the call is similar to Text Generation, it differs by including an encoded image file, referenced via theimage_path variable. A helper function is used to convert this image into a base64 string, allowing it to be passed alongside the text in the request.
Step 1
Make a new Python file and copy the code below.
This example uses the
gemma-3-12b-it model, Google’s vision-capable Gemma 3 model available via the Global Model Catalog (Japan).Step 2
Use your Infercom API key from the API keys and URLs page to replace the placeholder
"your-infercom-api-key" in the construction of the client.