Check Processing Delays
Incident Report for Onfido
Postmortem

Summary

A mismatch of check volumes and handling capacity resulted in an increased queue time for document checks requiring human-assisted review. This resulted in extended processing times over a 2 hour period, with the highest impact being 20 minutes delays before verification completion.

Timeline

  • Monday 10th June 14:08 BST: We were alerted to volumes increasing for our manual review teams, resulting in some checks having extended processing times. The impact at this point was low, and self-contained.
  • Monday 10th June 15:52 BST: We were alerted again to volumes further increasing this time further increasing the processing time of verifications.
  • Monday 10th June 16:20 BST:

    • We updated the status page to highlight that turnaround times had reached 15 minutes.
    • We added capacity to our processing teams and turnaround times starting to reduce.
    • Turnaround times came down from 15 mins back to below 5 minutes
  • Monday 10th June 18:55 BST: Status page was updated to inform that the issue was resolved.

Root Cause

As part of Onfido’s ‘Boost’ document configuration, any check which has an overall sub-result of ≠ ‘clear’ (including cases where there are higher suspicions of fraud) is pushed to a specialist team of fraud analysts for manual review.

Turnaround times are subject to check completion throughput remaining equal to or greater than the flow of incoming verifications. In this instance the issue was due to a drop in the capacity between 1-2pm BST.

Due to low volumes being handled in these intervals, a lack of staffing availability caused volume build up in the following intervals. Additional capacity was deployed to normalise the situation in later intervals. However, by this point there was already a backlog of ~15 minutes.

Remedies

  1. We have analysed peak periods and implemented increased capacity where there are troughs in processing times.
  2. As a result of this, we have already implemented more regular reporting in order to identify drops in throughput sooner.
  3. We are implementing more granular real-time reporting and alerts. This will ensure we can rapidly and proactively identify similar issues in future, limiting any impact on processing times.
Posted Jun 25, 2019 - 10:17 UTC

Resolved
This incident has been resolved. Post-mortem to follow once we have concluded our investigations.
Posted Jun 10, 2019 - 17:55 UTC
Identified
We have identified the underlying issue and are in the process of implementing a fix.

Next update will be at 19:00 UTC+1
Posted Jun 10, 2019 - 16:33 UTC
Update
We are still investigating the issue. Next update at 16:30 UTC+1.
Posted Jun 10, 2019 - 15:54 UTC
Investigating
We have identified an issue causing processing delays of upto 15mins. We are investigating and will provide an update at 15:45 UTC+1
Posted Jun 10, 2019 - 15:20 UTC