notify/retry.py
需要注意
+72 −0
@@ -0,0 +1,24 @@
1+import asyncio, random
2+import httpx
3+from .dead_letter import DeadLetterQueue
4+
5+MAX_ATTEMPTS = 6
6+BASE_DELAY = 2.0
7+MAX_DELAY = 300.0
8+
9+async def deliver_with_retry(client, event, dlq: DeadLetterQueue):
10+ for attempt in range(MAX_ATTEMPTS):
11+ try:
12+ resp = await client.post(event.url, json=event.payload)
13+ if resp.status_code < 400:
14+ return resp
15+ if 400 <= resp.status_code < 500:
16+ await dlq.push(event, reason=f"http_{resp.status_code}")
17+ return resp
18+ except httpx.TransportError:
19+ pass
20+ delay = min(BASE_DELAY * (2 ** attempt), MAX_DELAY)
21+ delay *= 1 + random.uniform(-0.2, 0.2)
22+ await asyncio.sleep(delay)
23+ await dlq.push(event, reason="max_attempts")
24+}
阻擋合併只 catch
TransportError不夠,httpx.TimeoutException不是它的子類,timeout 會直接冒泡出去把整個 worker task 吃掉。請改成(httpx.TransportError, httpx.TimeoutException)一起捕捉,並把訊息記進結構化 log,否則 5xx 重試還算正常、timeout 卻會悄悄丟事件。小建議POST 沒有給 timeout,預設會吃 client 級設定,但 retry 層自己再保險一道比較好。
client.post(..., timeout=10.0)比較不會跟外部慢服務一起被卡住超過退避視窗。