The webhook delivery worker retries failed deliveries automatically. When that’s not enough, you can manually replay any event for up to 30 days.Documentation Index
Fetch the complete documentation index at: https://docs.kash.bot/llms.txt
Use this file to discover all available pages before exploring further.
Automatic retry behaviour
If your endpoint:- Returns
2xx→ delivery succeeds. No retries. - Returns
3xx→ treated as misconfiguration (we don’t follow redirects on webhooks). Retried as a transient failure. - Returns
4xx(except 408/429) → treated as a terminal client error the customer must fix. Retries stop after one attempt; the event is marked terminally failed. - Returns
408,429, or any5xx→ retried with exponential backoff. - Times out (>10s) → retried.
- Network error (TCP reset, DNS failure, TLS handshake failure) → retried.
Retry schedule
| Attempt | Delay before | Approximate clock time after the originating event |
|---|---|---|
| 1 | (immediate) | T+0 |
| 2 | 30 sec | T+30s |
| 3 | 2 min | T+~2.5m |
| 4 | 8 min | T+~10m |
| 5 | 30 min | T+~40m |
| 6 | 2 hours | T+~2h40m |
| 7 | 6 hours | T+~8h40m |
| 8 | 12 hours | T+~21h |
| 9 | 18 hours | T+~39h |
| 10 | 24 hours (final) | T+~63h (~2.6 days) |
terminally failed (webhook_events.terminal_failure_at is set). Automatic delivery stops; the event remains in the database for 30 days for manual replay.
Each scheduled delay carries up to 20% jitter to avoid herd retry storms.
Distributed circuit breaker
Per-endpoint, not per-customer. If your URL has a high recent failure rate (5 consecutive failures, or >50% failures in the last 60s with at least 5 attempts), the circuit opens and we pause delivery for ~30 seconds before half-opening. This prevents one slow customer endpoint from starving the worker — other customers’ deliveries are unaffected. When the circuit is open, queued events for the same URL are not lost — they’re held in the SQS retry pipeline and resumed when the breaker closes.Inspecting delivery state
Every webhook event is queryable viaGET /v1/webhooks/events:
pending, delivered, failed, retrying, none) and date range to narrow down. Cursor-paginated.
The webapp’s Settings → Webhook Inspector UI renders the same data with timeline visualisations.
Manual redelivery
After fixing your endpoint (or rotating its URL), replay individual events:X-Kash-Event-Id as the original, so a customer that already processed the first delivery treats the replay as a no-op (assuming proper dedupe). This is the whole point of the dedupe pattern: a manual replay can never produce a duplicate side effect.
Replay amplification cap
Each event can be manually replayed at most 5 times in total. Beyond that,POST .../redeliver returns 429 WEBHOOK_REPLAY_LIMIT_REACHED. This bounds the cost of a runaway replay loop.
If you genuinely need more, contact support — we can lift the cap on a per-event basis after auditing.
Replay scope
- Manual replay works on any event status (failed, delivered, even successful — useful for testing your handler with a real payload).
- The replay enqueues a new delivery attempt; it doesn’t reset the original delivery’s history.
- Each replay is itself subject to the auto-retry schedule above.
- Replays use the current webhook secret (not the one that signed the original delivery). If you’ve rotated, the new signature reflects the current secret.
Bulk operations
The customer-facing API doesn’t expose bulk redelivery — that’s deliberate (amplification risk). For incident recovery (e.g. “all events for the last hour failed because my endpoint was down”), usekash-admin webhooks bulk-redeliver (admin CLI) which has additional safety guards.
Tracing a specific event
Thekash webhooks trace --event-id <id> CLI command pulls the full distributed trace for one event:
What survives 30 days
- Events
succeededor with no failures: kept indefinitely for audit. - Events
terminally failed: retained for 30 days, then archived to S3 (queryable via Athena, not via the API).
Common patterns
”I deployed a broken handler — how do I replay everything I missed?”
- Roll back or fix the handler.
- List failed events from the deploy window:
GET /v1/webhooks/events?status=failed&from=...&limit=100. - Iterate the cursor-paginated response, calling
POST .../redeliveron each id. - Watch deliveries succeed via
GET /v1/webhooks/events?status=delivered&from=<retry-time>.
”I want to test my handler against a real production payload”
- Find a known-good event id from
GET /v1/webhooks/events?status=delivered. - Point your test endpoint at a tunnel (e.g.
ngrok). - Update the key’s
webhook_urlto the tunnel URL. POST .../redeliveron the chosen event id.- Inspect what hits your handler.
- Restore the key’s
webhook_urlwhen you’re done.
kash webhooks simulate (CLI) to generate a synthetic but properly-signed payload locally — no production data round-trip needed.
Next
Verifying Signatures
Always verify before processing.
Secret Rotation
Rotate without dropping mid-rotation deliveries.