Provider Routing
OfoxAI supports a multi-provider architecture where the same model can be served through different provider nodes. Routing strategies let you control how requests are distributed.
Routing Strategies
| Strategy | Description | Use Case |
|---|---|---|
priority | Priority order (default) | Stability-first |
cost | Lowest cost first | Batch processing, cost-sensitive |
latency | Lowest latency first | Real-time chat, user interaction |
balanced | Load balancing | High-concurrency scenarios |
Usage
Configure routing strategy via the provider.routing extension parameter:
from openai import OpenAI
client = OpenAI(
base_url="https://api.ofox.ai/v1",
api_key="<your OFOXAI_API_KEY>"
)
response = client.chat.completions.create(
model="openai/gpt-4o",
messages=[{"role": "user", "content": "Hello"}],
extra_body={
"provider": {
"routing": "cost" # Lowest cost first
}
}
)const response = await client.chat.completions.create({
model: 'openai/gpt-4o',
messages: [{ role: 'user', content: 'Hello' }],
// @ts-ignore OfoxAI extension parameter
provider: {
routing: 'cost'
}
})Strategy Details
priority — Priority Routing (Default)
Routes requests according to OfoxAI’s predefined provider priority order. Prefers the most stable nodes.
cost — Cost-First
Automatically selects the lowest-cost provider node. Ideal for batch processing, data labeling, and other latency-insensitive scenarios.
latency — Latency-First
Selects the provider node with the lowest response latency. Ideal for real-time chat scenarios requiring fast responses.
balanced — Load Balancing
Distributes requests evenly across all available provider nodes. Ideal for high-concurrency scenarios to avoid single-point overload.
Best Practices
- Real-time chat — Use
latencyfor shorter user wait times - Batch tasks — Use
costto reduce overall costs - Production — Default
priorityfor stability - Combine with fallback — Routing strategies can be used together with the
fallbackparameter
You can also set a global default routing strategy in the OfoxAI Console without specifying it in each request.