串流回應
串流回應(Streaming)允許你在模型生成過程中即時接收輸出,提升使用者體驗和感知速度。
運作原理
OfoxAI 使用 Server-Sent Events (SSE) 協議實作串流回應:
- 用戶端傳送請求時設定
stream: true - 伺服器逐步回傳生成的內容片段(chunk)
- 每個 chunk 以
data:前綴透過 SSE 傳送 - 生成結束時傳送
data: [DONE]
OpenAI 協議串流
cURL
Terminal
curl https://api.ofox.ai/v1/chat/completions \
-H "Authorization: Bearer $OFOX_API_KEY" \
-H "Content-Type: application/json" \
-N \
-d '{
"model": "openai/gpt-4o",
"messages": [{"role": "user", "content": "寫一首關於程式設計的詩"}],
"stream": true
}'Anthropic 協議串流
Python
stream_anthropic.py
import anthropic
client = anthropic.Anthropic(
base_url="https://api.ofox.ai/anthropic",
api_key="<你的 OFOXAI_API_KEY>"
)
with client.messages.stream(
model="anthropic/claude-sonnet-4.5",
max_tokens=1024,
messages=[{"role": "user", "content": "寫一首關於程式設計的詩"}]
) as stream:
for text in stream.text_stream:
print(text, end="", flush=True)串流 + Function Calling
串流回應也支援函式呼叫場景。模型會先串流輸出工具呼叫請求,你處理完成後繼續對話:
stream_with_tools.py
stream = client.chat.completions.create(
model="openai/gpt-4o",
messages=[{"role": "user", "content": "今天北京天氣怎麼樣?"}],
tools=[{
"type": "function",
"function": {
"name": "get_weather",
"description": "取得指定城市的天氣",
"parameters": {
"type": "object",
"properties": {
"city": {"type": "string", "description": "城市名稱"}
},
"required": ["city"]
}
}
}],
stream=True
)
for chunk in stream:
delta = chunk.choices[0].delta
if delta.tool_calls:
# 處理工具呼叫
print(f"呼叫工具: {delta.tool_calls[0].function}")
elif delta.content:
print(delta.content, end="", flush=True)錯誤處理和重連
串流連線可能因網路問題中斷。建議實作重連邏輯。
stream_retry.py
import time
def stream_with_retry(client, max_retries=3, **kwargs):
for attempt in range(max_retries):
try:
stream = client.chat.completions.create(stream=True, **kwargs)
for chunk in stream:
yield chunk
return # 成功完成
except Exception as e:
if attempt < max_retries - 1:
wait = 2 ** attempt # 指數退避
print(f"\n連線中斷,{wait}s 後重試...")
time.sleep(wait)
else:
raise e最佳實踐
- 始終設定逾時 — 避免無限等待
- 處理不完整的 chunk — 某些 chunk 可能沒有 content
- 實作重連機制 — 使用指數退避策略
- 前端使用
flush— 確保內容即時顯示
Last updated on