Summary
On 26 January 2026 between 12.16 UTC and 12.35 UTC, our document reports processing was severely degraded with customers experiencing extended processing time to about 85% of their traffic. The incident also caused extended processing time on Biometric Authentications and Biometric Verifications for 20% of the traffic between 12.25 UTC and 12.35 UTC.
Root Causes
A core service for fraud prevention on documents came under elevated load, causing kubernetes pods to go down in quick sequence. Our upstream service retry policy proved too aggressive to let the service recover and required manual scaling-up.
The retry policy also caused elevated load on a shared database which in turn also affected the biometrics service.
Timeline
- 12:16 UTC – Our monitoring detected a sharp increase in errors when processing document reports.
- 12:19 UTC – The on‑call team was alerted and began investigating the affected document‑processing service.
- 12:25 UTC – We identified that the incident was also affecting a small portion of biometric checks, leading to some failures and short delays.
- 12:29 UTC – On-call engineers manually increased capacity for the impacted document fraud‑prevention service.
- 12:35 UTC – Error rates for both document reports and biometric checks returned to normal and new requests were being processed successfully.
- 12:45–13:06 UTC – We processed document reports that were impacted during the incident window.
- 13:15 UTC – We confirmed that all affected document and biometric requests had completed successfully and marked the incident as resolved.
Remedies
- We have continued to fine-tune our auto-scaling parameters of the affected service to scale up with lower CPU targets.
- We modified the retry policy of fraud services to avoid overloading an already struggling service so that it can auto-recover.
- We made our load-tests on this service more representative of production traffic (similar image sizes/document type distribution…) and test for accelerated traffic spikes.
- We lowered the total amount of shared database connections the document processing can take to avoid noisy neighbour impact on biometrics processing.