Dear Hiring Manager,
I'm writing about the Senior Backend Engineer role at Stripe on the Payments Reliability team. For the last 18 months I've owned the billing API at a payments startup (Go 1.22, Postgres-Citus, gRPC) — 8k req/sec at P99 85ms, shaved 340ms off tail latency by rearchitecting read-replica routing — which is the same class of problem your job post calls out.
Beyond the billing API, I rebuilt our authentication service as a gRPC-only module adopted by 28 downstream services (inter-service auth overhead dropped 12ms → 3ms at P99), shipped an event-sourced audit log processing 220k events/sec on Kafka + ScyllaDB with at-least-once semantics, and carried on-call for 4 critical services through 2024 with MTTA under 4 minutes and MTTR under 30. My stack is Go, Postgres, Kafka, Redis, gRPC, and OpenTelemetry for observability; I've worked in Python on data-pipeline adjacencies but I'm not claiming depth there.
Stripe's reliability bar is one of the reasons I use the product for my own side projects. The Payments Reliability team specifically — the public incident reviews I've read are the kind of postmortem culture I want to contribute to, and the work you've published on idempotency keys influenced the retry layer I designed at my current company.
Could we schedule a 30-minute technical conversation? Happy to walk through the billing-API latency work as a worked example of how I scope and sequence this class of project.
Best, Sam Rivera