Smart Model Routing
OfoxAI’s smart model routing automatically selects the best model for your request, optimizing across cost, speed, quality, and other dimensions.
Auto Mode
The simplest approach — set model: "auto" and let OfoxAI choose automatically:
response = client.chat.completions.create(
model="auto",
messages=[{"role": "user", "content": "Explain quantum computing"}]
)
# Check which model was actually used
print(response.model) # e.g. "openai/gpt-4o"Auto mode analyzes the complexity of your request content and the current state of available models to automatically select the most suitable one.
Model Pool Configuration
You can specify a candidate model pool and routing preference:
model_routing.py
response = client.chat.completions.create(
model="auto",
messages=[{"role": "user", "content": "Help me optimize this code"}],
extra_body={
"model_routing_config": {
"models": [
"openai/gpt-4o",
"anthropic/claude-sonnet-4.5",
"google/gemini-3-flash-preview"
],
"preference": "quality" # Quality-first
}
}
)Routing Preferences
| Preference | Description |
|---|---|
balanced | Balanced consideration of quality, speed, and cost (default) |
quality | Quality-first, selects the most capable model |
speed | Speed-first, selects the fastest responding model |
cost | Cost-first, selects the cheapest model |
Use Cases
Cost Optimization
For simple conversations, automatically use cheaper models; for complex tasks, use premium models:
# Simple scenario → might select gpt-4o-mini or gemini-3-flash-preview
response = client.chat.completions.create(
model="auto",
messages=[{"role": "user", "content": "What day is it today?"}],
extra_body={"model_routing_config": {"preference": "cost"}}
)High Availability
Specify multiple fallback models to ensure uninterrupted service:
response = client.chat.completions.create(
model="auto",
messages=[{"role": "user", "content": "Analyze market trends"}],
extra_body={
"model_routing_config": {
"models": [
"openai/gpt-4o",
"anthropic/claude-sonnet-4.5",
"google/gemini-3.1-pro-preview"
],
"preference": "balanced"
}
}
)Smart routing automatically senses each model’s real-time status (latency, availability, load) to make the optimal selection from the candidate pool.
Last updated on