OfoxAI 支持哪些 API 协议？

OfoxAI 支持三大原生协议：OpenAI 兼容 (https://api.ofox.ai/v1)、Anthropic 原生 (https://api.ofox.ai/anthropic)、Gemini 原生 (https://api.ofox.ai/gemini)。无需修改代码，直接替换 base URL 即可迁移。

OfoxAI 支持哪些 AI 模型？

OfoxAI 支持 100+ 模型，包括 GPT-5.3 Codex、Claude Opus 4.6、Gemini 3.1 Pro、DeepSeek V3.2、Qwen3.5-Plus、Kimi-K2.5、Grok 4、Llama 4 等旗舰和开源模型，以及 Sora、Kling、Flux 等 AIGC 模型。

如何在 Claude Code 中使用 OfoxAI？

只需设置环境变量：export ANTHROPIC_BASE_URL=https://api.ofox.ai/anthropic 和 export ANTHROPIC_AUTH_TOKEN=你的OfoxAI Key，重启 Claude Code 即可。详见 https://docs.ofox.ai/develop/integrations/claude-code

OfoxAI 在中国可以使用吗？

可以。OfoxAI 提供国内直连，通过香港快速节点访问，无需科学上网，低延迟。支持微信/支付宝充值。

速率限制

OfoxAI 的速率限制保障平台稳定性。了解限制规则并优化调用策略。

默认限制

OfoxAI 按量付费，所有用户共享统一的速率策略：

限制项	额度
RPM（请求/分钟）	200
TPM（Token/分钟）	不限

如需更高 RPM 配额，请联系 OfoxAI 支持申请调整。

Rate Limit Header

每个 API 响应都包含速率限制信息：


x-ratelimit-limit-requests: 200
x-ratelimit-remaining-requests: 195
x-ratelimit-reset-requests: 12s

Header	说明
`x-ratelimit-limit-requests`	RPM 限制值
`x-ratelimit-remaining-requests`	剩余请求次数
`x-ratelimit-reset-requests`	请求限制重置时间

429 错误处理

当触发限流时，API 返回 429 Too Many Requests：


from openai import RateLimitError
import time
 
try:
    response = client.chat.completions.create(...)
except RateLimitError as e:
    retry_after = float(e.response.headers.get("retry-after", 1))
    print(f"触发限流，等待 {retry_after}s...")
    time.sleep(retry_after)

优化策略

1. 使用 Prompt Caching

对于重复的 system prompt，启用缓存可以减少 token 消耗：


response = client.chat.completions.create(
    model="openai/gpt-4o",
    messages=[
        # 较长的 system prompt 会被自动缓存
        {"role": "system", "content": "你是一个专业的...（此处省略长文本）"},
        {"role": "user", "content": "用户问题"}
    ]
)

详见提示缓存。

2. 批量处理

将多个短请求合并为一个请求：


# ❌ 不推荐：为每个问题发送独立请求
for question in questions:
    client.chat.completions.create(messages=[{"role": "user", "content": question}])
 
# ✅ 推荐：合并为一个请求
combined = "\n".join(f"{i+1}. {q}" for i, q in enumerate(questions))
client.chat.completions.create(
    messages=[{"role": "user", "content": f"请依次回答以下问题：\n{combined}"}]
)

3. 选择合适的模型

场景	推荐模型	理由
简单对话	`openai/gpt-4o-mini`	快速、省 token
复杂推理	`openai/gpt-4o`	高质量输出
代码生成	`anthropic/claude-sonnet-4.5`	代码能力强
长文本处理	`google/gemini-3-flash-preview`	大上下文、高性价比

4. 控制 max_tokens

设置合理的 max_tokens 限制，避免不必要的 token 消耗：


response = client.chat.completions.create(
    model="openai/gpt-4o",
    messages=[{"role": "user", "content": "一句话总结"}],
    max_tokens=100  # 限制输出长度
)

5. 使用模型回退

当主模型达到限制时，自动切换到备选模型：


response = client.chat.completions.create(
    model="openai/gpt-4o",
    messages=[...],
    extra_body={
        "provider": {
            "fallback": ["anthropic/claude-sonnet-4.5", "google/gemini-3-flash-preview"]
        }
    }
)

详见故障回退。