Skip to Content

Provider Routing

OfoxAI supports a multi-provider architecture where the same model can be served through different provider nodes. Routing strategies let you control how requests are distributed.

Routing Strategies

StrategyDescriptionUse Case
priorityPriority order (default)Stability-first
costLowest cost firstBatch processing, cost-sensitive
latencyLowest latency firstReal-time chat, user interaction
balancedLoad balancingHigh-concurrency scenarios

Usage

Configure routing strategy via the provider.routing extension parameter:

routing.py
from openai import OpenAI client = OpenAI( base_url="https://api.ofox.ai/v1", api_key="<your OFOXAI_API_KEY>" ) response = client.chat.completions.create( model="openai/gpt-4o", messages=[{"role": "user", "content": "Hello"}], extra_body={ "provider": { "routing": "cost" # Lowest cost first } } )
routing.ts
const response = await client.chat.completions.create({ model: 'openai/gpt-4o', messages: [{ role: 'user', content: 'Hello' }], // @ts-ignore OfoxAI extension parameter provider: { routing: 'cost' } })

Strategy Details

priority — Priority Routing (Default)

Routes requests according to OfoxAI’s predefined provider priority order. Prefers the most stable nodes.

cost — Cost-First

Automatically selects the lowest-cost provider node. Ideal for batch processing, data labeling, and other latency-insensitive scenarios.

latency — Latency-First

Selects the provider node with the lowest response latency. Ideal for real-time chat scenarios requiring fast responses.

balanced — Load Balancing

Distributes requests evenly across all available provider nodes. Ideal for high-concurrency scenarios to avoid single-point overload.

Best Practices

  1. Real-time chat — Use latency for shorter user wait times
  2. Batch tasks — Use cost to reduce overall costs
  3. Production — Default priority for stability
  4. Combine with fallback — Routing strategies can be used together with the fallback parameter

You can also set a global default routing strategy in the OfoxAI Console without specifying it in each request.

Last updated on