Document report processing disrupted in EU

Incident Report for Onfido

Postmortem

Summary

For the EU region, one critical service struggled to reprocess the traffic affected by a previous faulty release which led to higher Turnaround Time (TaT) for all Document reports created between 10:32 and 10:50 UTC.

All impacted Document Reports were completed successfully with an average TaT of ~6 minutes.

Root Causes

The re-processing batch caused a spike in traffic and the auto-scaling didn’t work as expected for one critical service. The service entered a crash loop state and had to be manually up-scaled to recover. An unbounded number of in-flight requests were accepted leading to memory exhaustion and I/O event loop non-responsiveness, while waiting for a downstream ML inference service to scale up.

Timeline

10:32 UTC: The critical service went up to a 100% error rate

10:33 UTC: Engineers who initiated the report backlog reprocessing become aware of the issue through our monitoring and start investigating

10:48 UTC: We manually scaled up the critical service

10:50 UTC: The service went back to normal and errors stopped

Remedies

  • Investigate how to reduce memory footprint on this service, allowing bigger request queues while it waits for downstream ML model serving to scale up
  • Change auto-scaling parameters to be more aggressive (i.e. scale with lower CPU targets)
  • Add concurrent requests monitoring
  • Reduce ML model serving image sizes for faster scaling of inference services
  • Improve back-pressure mechanisms to be able to sustain minimum traffic levels independent of spikes while auto-scaling kicks-in
  • Change our weekly load testing scripts to specifically test for accelerated traffic spikes
Posted Jan 30, 2026 - 11:38 UTC

Resolved

Between 10:30 UTC and 10:47 UTC there was disruption to document report processing in EU, as a side effect of reports delayed from the earlier incident being re-processed. Further details will follow in a postmortem.
Posted Jan 21, 2026 - 10:30 UTC