Skip to Content
Developer GuideAdvancedSmart Model Routing

Smart Model Routing

OfoxAI’s smart model routing automatically selects the best model for your request, optimizing across cost, speed, quality, and other dimensions.

Auto Mode

The simplest approach — set model: "auto" and let OfoxAI choose automatically:

response = client.chat.completions.create( model="auto", messages=[{"role": "user", "content": "Explain quantum computing"}] ) # Check which model was actually used print(response.model) # e.g. "openai/gpt-4o"

Auto mode analyzes the complexity of your request content and the current state of available models to automatically select the most suitable one.

Model Pool Configuration

You can specify a candidate model pool and routing preference:

model_routing.py
response = client.chat.completions.create( model="auto", messages=[{"role": "user", "content": "Help me optimize this code"}], extra_body={ "model_routing_config": { "models": [ "openai/gpt-4o", "anthropic/claude-sonnet-4.5", "google/gemini-3-flash-preview" ], "preference": "quality" # Quality-first } } )

Routing Preferences

PreferenceDescription
balancedBalanced consideration of quality, speed, and cost (default)
qualityQuality-first, selects the most capable model
speedSpeed-first, selects the fastest responding model
costCost-first, selects the cheapest model

Use Cases

Cost Optimization

For simple conversations, automatically use cheaper models; for complex tasks, use premium models:

# Simple scenario → might select gpt-4o-mini or gemini-3-flash-preview response = client.chat.completions.create( model="auto", messages=[{"role": "user", "content": "What day is it today?"}], extra_body={"model_routing_config": {"preference": "cost"}} )

High Availability

Specify multiple fallback models to ensure uninterrupted service:

response = client.chat.completions.create( model="auto", messages=[{"role": "user", "content": "Analyze market trends"}], extra_body={ "model_routing_config": { "models": [ "openai/gpt-4o", "anthropic/claude-sonnet-4.5", "google/gemini-3.1-pro-preview" ], "preference": "balanced" } } )

Smart routing automatically senses each model’s real-time status (latency, availability, load) to make the optimal selection from the candidate pool.

Last updated on