Design a centralized metrics and logging pipeline
Reported in Booking.com European engineering loops. Observability architecture interview question for platform roles.
Interview scenario
Context for Booking.com candidates:
Collect logs and metrics from thousands of services with searchable dashboards and alerting.
Model answer
Try answering aloud first
Cover trade-offs, structure, and a concrete example before revealing the baseline response.
How to frame this at Booking.com: Connect your answer to measurable impact, clarity of thought, and trade-offs the team cares about. Below is a strong baseline response you can adapt with your own project examples.
Agents on each host collect telemetry and push to ingestion gateways with backpressure controls. Use separate streams for logs, metrics, and traces because storage and query patterns differ significantly.
Metrics flow into time-series storage with retention tiers; logs flow into indexed document storage or object-backed lake with hot-warm-cold strategy. Add schema standards for service name, environment, region, and correlation id.
Reliability requires buffering, retry queues, and sampling controls during incident storms. Explain SLO-driven alerts to reduce noisy thresholds and improve on-call signal quality.
Discussion
Comments (0)
Share how this question came up in your loop, or add tips for others preparing.
Log in to comment on this question.