Design reliable webhook delivery with retries
Reported in Atos European engineering loops. Event delivery design question focusing on retries and idempotency.
Interview scenario
Context for Atos candidates:
Third-party endpoints receive webhook events and may be slow or unreliable.
Model answer
Try answering aloud first
Cover trade-offs, structure, and a concrete example before revealing the baseline response.
How to frame this at Atos: Connect your answer to measurable impact, clarity of thought, and trade-offs the team cares about. Below is a strong baseline response you can adapt with your own project examples.
Persist webhook events durably before delivery, then process asynchronously using worker queues. Each event gets a unique event id and signed payload to support verification and idempotent consumer handling.
Apply exponential backoff with jitter, maximum retry window, and dead-letter queue for repeated failures. Provide replay tooling and delivery logs so customers can debug endpoint issues.
Rate-limit per destination and circuit-break failing endpoints to protect the system. This balances reliability with fair resource usage.
Discussion
Comments (0)
Share how this question came up in your loop, or add tips for others preparing.
Log in to comment on this question.