Increase in webhooks latency

Incident Report for Onfido

Postmortem

Summary

Webhooks latency increased to up to 49 minutes in the EU region between 12:39 and 13:39 UTC on the October 9, 2025.

Although no webhooks were lost during the incident, some clients encountered rate limit errors when calling the API while processing a surge of delayed webhooks. These webhooks were subsequently retried according to the logic outlined in our public documentation.

Root Causes

An infrastructure dependency that helps reduce duplicate delivery of webhooks experienced a hardware failure. As a result there was a significant decrease of the service throughput during the incident.

Timeline

12:39 UTC: Throughput of the service responsible for delivering webhooks decreased

12:45 UTC: Our on-call team gets notified and starts the investigation

13:16 UTC: The on-call team acknowledge the widespread impact of the incident and updates the status page

13:42 UTC: The on-call team identifies the faulty infrastructure dependency

13:49 UTC: The service recovered and all pending webhooks were delivered

Remedies

We’ll improve the resilience of webhook delivery in case of failures to this piece of infrastructure. We will also update our runbooks with specific instructions to help diagnose this type of failure and therefore decrease the recovery time in the case of similar incidents.

Posted Oct 13, 2025 - 14:06 UTC

Resolved

This incident has been resolved.
Posted Oct 09, 2025 - 13:55 UTC

Monitoring

A fix has been implemented and we are monitoring the results.
Posted Oct 09, 2025 - 13:50 UTC

Investigating

We've identified an increase in webhooks latency affecting the EU region.
Posted Oct 09, 2025 - 13:17 UTC
This incident affected: Europe (onfido.com) (Webhooks).