Skip to Content

Vision

OfoxAI supports visual input for multimodal models, enabling analysis of images, screenshots, documents, and video content.

Supported Models

ModelImagesVideoDescription
openai/gpt-4oHigh-quality image analysis
openai/gpt-4o-miniFast image analysis
anthropic/claude-sonnet-4.5Strong document and code understanding
google/gemini-3-flash-previewMultimodal all-rounder
google/gemini-3.1-pro-previewMost capable multimodal reasoning

Image Analysis

Sending Images via URL

Terminal
curl https://api.ofox.ai/v1/chat/completions \ -H "Authorization: Bearer $OFOX_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "model": "openai/gpt-4o", "messages": [{ "role": "user", "content": [ {"type": "text", "text": "Describe what is in this image"}, {"type": "image_url", "image_url": {"url": "https://example.com/photo.jpg"}} ] }] }'

Sending Images via Base64

Suitable for local files or screenshot scenarios:

vision_base64.py
import base64 # Read local image with open("screenshot.png", "rb") as f: image_data = base64.standard_b64encode(f.read()).decode("utf-8") response = client.chat.completions.create( model="openai/gpt-4o", messages=[{ "role": "user", "content": [ {"type": "text", "text": "What does this screenshot show?"}, { "type": "image_url", "image_url": { "url": f"data:image/png;base64,{image_data}" } } ] }] )

Image Detail Level

Control analysis precision with the detail parameter:

ValueDescriptionUse Case
autoAutomatic selection (default)General scenarios
lowLower precision, fasterSimple classification, tag identification
highHigher precision, more detailedDocument OCR, detailed analysis
{ "type": "image_url", "image_url": { "url": "https://example.com/document.jpg", "detail": "high" # High precision mode } }

Multi-Image Comparison

You can send multiple images in a single request:

response = client.chat.completions.create( model="openai/gpt-4o", messages=[{ "role": "user", "content": [ {"type": "text", "text": "Compare the differences between these two images"}, {"type": "image_url", "image_url": {"url": "https://example.com/before.jpg"}}, {"type": "image_url", "image_url": {"url": "https://example.com/after.jpg"}} ] }] )

Vision Input with Anthropic Protocol

import anthropic client = anthropic.Anthropic( base_url="https://api.ofox.ai/anthropic", api_key="<your OFOXAI_API_KEY>" ) message = client.messages.create( model="anthropic/claude-sonnet-4.5", max_tokens=1024, messages=[{ "role": "user", "content": [ { "type": "image", "source": { "type": "base64", "media_type": "image/jpeg", "data": image_data } }, {"type": "text", "text": "Describe this image"} ] }] )

Common Use Cases

  • Document OCR — Extract text and tables from images
  • Code screenshot analysis — Analyze code in screenshots and provide suggestions
  • UI review — Analyze interface design and layout
  • Chart interpretation — Analyze data charts and visualizations
  • Object recognition — Identify objects and scenes in images
Last updated on