Fallback
OfoxAI’s fallback mechanism automatically switches to alternative models when the primary model is unavailable, ensuring your service stays uninterrupted.
How It Works
- The request is sent to the primary model
- If the primary model returns an error (5xx, timeout, rate limit, etc.)
- Alternative models from the fallback list are tried in order
- The first successful response is returned
Per-Request Fallback
Configure fallback for individual requests using the provider.fallback parameter:
fallback.py
from openai import OpenAI
client = OpenAI(
base_url="https://api.ofox.ai/v1",
api_key="<your OFOXAI_API_KEY>"
)
response = client.chat.completions.create(
model="openai/gpt-4o", # Primary model
messages=[{"role": "user", "content": "Hello"}],
extra_body={
"provider": {
"fallback": [
"anthropic/claude-sonnet-4.5", # First fallback
"google/gemini-3-flash-preview" # Second fallback
]
}
}
)
# Check which model was actually used
print(response.model)fallback.ts
const response = await client.chat.completions.create({
model: 'openai/gpt-4o',
messages: [{ role: 'user', content: 'Hello' }],
// @ts-ignore OfoxAI extension parameter
provider: {
fallback: [
'anthropic/claude-sonnet-4.5',
'google/gemini-3-flash-preview'
]
}
})Global Fallback Configuration
Configure a global fallback strategy in the OfoxAI Console without specifying it in each request:
- Log in to the OfoxAI Console
- Go to Settings → Routing Policy
- Configure the default fallback model list
Per-request fallback parameters override the global configuration.
Fallback Triggers
The following conditions trigger a fallback:
| Condition | Description |
|---|---|
| HTTP 5xx | Server error |
| Request timeout | Model response timeout |
| 429 Rate limit | Upstream model rate limit reached |
| Model unavailable | Provider maintenance or decommission |
The following conditions do not trigger a fallback:
| Condition | Description |
|---|---|
| HTTP 4xx (except 429) | Client errors require fixing the request |
| Content filtering | Model refused to generate content |
Combining with Routing
Fallback can be combined with provider routing:
response = client.chat.completions.create(
model="openai/gpt-4o",
messages=[{"role": "user", "content": "Hello"}],
extra_body={
"provider": {
"routing": "latency", # Latency-first routing
"fallback": [ # Fallback list
"anthropic/claude-sonnet-4.5",
"google/gemini-3-flash-preview"
]
}
}
)Recommended Fallback Configurations
General Chat
"provider": {
"fallback": ["anthropic/claude-sonnet-4.5", "google/gemini-3-flash-preview"]
}Code Generation
"provider": {
"fallback": ["anthropic/claude-sonnet-4.5", "deepseek/deepseek-chat"]
}Cost-Effective
"provider": {
"fallback": ["openai/gpt-4o-mini", "google/gemini-3-flash-preview", "deepseek/deepseek-chat"]
}Best Practices
- Choose fallback models with similar capabilities — Ensure consistent output quality after fallback
- Use cross-provider fallbacks — Avoid all models being down from the same provider
- Set 2-3 fallback options — Sufficient to handle most failure scenarios
- Monitor fallback frequency — Frequent fallbacks may indicate the need to switch your primary model