Skip to Content

Fallback

OfoxAI’s fallback mechanism automatically switches to alternative models when the primary model is unavailable, ensuring your service stays uninterrupted.

How It Works

  1. The request is sent to the primary model
  2. If the primary model returns an error (5xx, timeout, rate limit, etc.)
  3. Alternative models from the fallback list are tried in order
  4. The first successful response is returned

Per-Request Fallback

Configure fallback for individual requests using the provider.fallback parameter:

fallback.py
from openai import OpenAI client = OpenAI( base_url="https://api.ofox.ai/v1", api_key="<your OFOXAI_API_KEY>" ) response = client.chat.completions.create( model="openai/gpt-4o", # Primary model messages=[{"role": "user", "content": "Hello"}], extra_body={ "provider": { "fallback": [ "anthropic/claude-sonnet-4.5", # First fallback "google/gemini-3-flash-preview" # Second fallback ] } } ) # Check which model was actually used print(response.model)
fallback.ts
const response = await client.chat.completions.create({ model: 'openai/gpt-4o', messages: [{ role: 'user', content: 'Hello' }], // @ts-ignore OfoxAI extension parameter provider: { fallback: [ 'anthropic/claude-sonnet-4.5', 'google/gemini-3-flash-preview' ] } })

Global Fallback Configuration

Configure a global fallback strategy in the OfoxAI Console without specifying it in each request:

  1. Log in to the OfoxAI Console 
  2. Go to SettingsRouting Policy
  3. Configure the default fallback model list

Per-request fallback parameters override the global configuration.

Fallback Triggers

The following conditions trigger a fallback:

ConditionDescription
HTTP 5xxServer error
Request timeoutModel response timeout
429 Rate limitUpstream model rate limit reached
Model unavailableProvider maintenance or decommission

The following conditions do not trigger a fallback:

ConditionDescription
HTTP 4xx (except 429)Client errors require fixing the request
Content filteringModel refused to generate content

Combining with Routing

Fallback can be combined with provider routing:

response = client.chat.completions.create( model="openai/gpt-4o", messages=[{"role": "user", "content": "Hello"}], extra_body={ "provider": { "routing": "latency", # Latency-first routing "fallback": [ # Fallback list "anthropic/claude-sonnet-4.5", "google/gemini-3-flash-preview" ] } } )

General Chat

"provider": { "fallback": ["anthropic/claude-sonnet-4.5", "google/gemini-3-flash-preview"] }

Code Generation

"provider": { "fallback": ["anthropic/claude-sonnet-4.5", "deepseek/deepseek-chat"] }

Cost-Effective

"provider": { "fallback": ["openai/gpt-4o-mini", "google/gemini-3-flash-preview", "deepseek/deepseek-chat"] }

Best Practices

  1. Choose fallback models with similar capabilities — Ensure consistent output quality after fallback
  2. Use cross-provider fallbacks — Avoid all models being down from the same provider
  3. Set 2-3 fallback options — Sufficient to handle most failure scenarios
  4. Monitor fallback frequency — Frequent fallbacks may indicate the need to switch your primary model