Vision
OfoxAI supports visual input for multimodal models, enabling analysis of images, screenshots, documents, and video content.
Supported Models
| Model | Images | Video | Description |
|---|---|---|---|
openai/gpt-4o | ✅ | — | High-quality image analysis |
openai/gpt-4o-mini | ✅ | — | Fast image analysis |
anthropic/claude-sonnet-4.5 | ✅ | — | Strong document and code understanding |
google/gemini-3-flash-preview | ✅ | ✅ | Multimodal all-rounder |
google/gemini-3.1-pro-preview | ✅ | ✅ | Most capable multimodal reasoning |
Image Analysis
Sending Images via URL
cURL
Terminal
curl https://api.ofox.ai/v1/chat/completions \
-H "Authorization: Bearer $OFOX_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "openai/gpt-4o",
"messages": [{
"role": "user",
"content": [
{"type": "text", "text": "Describe what is in this image"},
{"type": "image_url", "image_url": {"url": "https://example.com/photo.jpg"}}
]
}]
}'Sending Images via Base64
Suitable for local files or screenshot scenarios:
vision_base64.py
import base64
# Read local image
with open("screenshot.png", "rb") as f:
image_data = base64.standard_b64encode(f.read()).decode("utf-8")
response = client.chat.completions.create(
model="openai/gpt-4o",
messages=[{
"role": "user",
"content": [
{"type": "text", "text": "What does this screenshot show?"},
{
"type": "image_url",
"image_url": {
"url": f"data:image/png;base64,{image_data}"
}
}
]
}]
)Image Detail Level
Control analysis precision with the detail parameter:
| Value | Description | Use Case |
|---|---|---|
auto | Automatic selection (default) | General scenarios |
low | Lower precision, faster | Simple classification, tag identification |
high | Higher precision, more detailed | Document OCR, detailed analysis |
{
"type": "image_url",
"image_url": {
"url": "https://example.com/document.jpg",
"detail": "high" # High precision mode
}
}Multi-Image Comparison
You can send multiple images in a single request:
response = client.chat.completions.create(
model="openai/gpt-4o",
messages=[{
"role": "user",
"content": [
{"type": "text", "text": "Compare the differences between these two images"},
{"type": "image_url", "image_url": {"url": "https://example.com/before.jpg"}},
{"type": "image_url", "image_url": {"url": "https://example.com/after.jpg"}}
]
}]
)Vision Input with Anthropic Protocol
import anthropic
client = anthropic.Anthropic(
base_url="https://api.ofox.ai/anthropic",
api_key="<your OFOXAI_API_KEY>"
)
message = client.messages.create(
model="anthropic/claude-sonnet-4.5",
max_tokens=1024,
messages=[{
"role": "user",
"content": [
{
"type": "image",
"source": {
"type": "base64",
"media_type": "image/jpeg",
"data": image_data
}
},
{"type": "text", "text": "Describe this image"}
]
}]
)Common Use Cases
- Document OCR — Extract text and tables from images
- Code screenshot analysis — Analyze code in screenshots and provide suggestions
- UI review — Analyze interface design and layout
- Chart interpretation — Analyze data charts and visualizations
- Object recognition — Identify objects and scenes in images
Last updated on