Increased document report TaT in the EU cluster

Incident Report for Onfido

Postmortem

Summary

A change to document classification logic led to 30% of the reports not being automatically classified. Those reports were sent for manual review, leading to an increase in turn around time. The change was reverted shortly after and automated processing returned to normal. Impacted checks were re-run, leaving a residual backlog of reports requiring manual review that was cleared a few hours later.

Root Causes

A change was deployed with a minor error that was not apparent in testing. This led to the model being unable to classify documents reliably, resulting in a significant portion of the reports being sent for review.

Timeline

13:45 UTC: We deploy a change to our document classification logic. A signifiant portion of traffic starts being sent for manual review.

14:30 UTC: We are notified about a drop in automation rate and increase in manual queues.

14:37 UTC: Having identified the cause and reverted the change, automation rate returns to normal levels.

15:15 UTC: We continue to monitor. Impacted reports are rerun to assist clearing down manual backlog.

17:29 UTC: All reruns completed, positively impacting manual queues.

Remedies

We are adding new monitoring capabilities on automation rates for a faster notification and response time.

We are reviewing our recovering process so we are able to perform reruns of impacted reports faster.

We are adding non regression tests to the classification logic to prevent similar incidents.

Posted Sep 27, 2025 - 07:32 UTC

Resolved

This issue is now resolved: Increased document report TaT in the EU cluster.

A modest number of the queued manual reports remain and are expected to be cleared soon.

We take great pride in running a robust and reliable service, and we're working hard to ensure this does not happen again. A detailed postmortem will follow once we've concluded our investigation.

Posted Sep 24, 2025 - 17:29 UTC

Update

The fix has proven to be effective, and we are now processing live document reports within the expected TaT, given that processing is automatic-only. For document reports that require manual processing, we are clearing the backlog as quickly as possible and hope to return to normal TaT within the next 3 hours.

Please bear with us while we get back on our feet, and we appreciate your patience during this incident.

Posted Sep 24, 2025 - 16:36 UTC

Monitoring

We have implemented a fix for this issue.

We are closely monitoring to ensure the issue has been resolved and everything is working as expected. Please bear with us while we get back on our feet, and we appreciate your patience during this incident.

Posted Sep 24, 2025 - 15:37 UTC

Identified

We have identified an issue in our EU region that is negatively impacting the document report TaT. A fix has been implemented, and we are currently monitoring its effectiveness.

Posted Sep 24, 2025 - 14:57 UTC

This incident affected: Europe (onfido.com) (Document Verification).