Possible issues with API timeouts on check creation
Incident Report for Onfido
Postmortem

Summary

On August 7th from 11:16 to 12:32 UTC a number of Onfido accounts were unable to create checks. At 12:32 UTC the incident was resolved for the majority of accounts.
From 12:32 to 13:46 UTC around 8% were still having the issue. For these remaining accounts the issue was fixed at 13:46UTC.

Root Causes

Due to a human configuration error, all accounts were mistakenly configured to use our legacy approach to calculating available credit for creating checks.  This legacy approach had been deprecated from active usage, but was still available as an option.
Newer accounts, setup after that approach had been deprecated, failed to pass the available credit billing rule, resulting in 422 errors.
Older accounts that were setup pre-deprecation worked, although some experienced timeouts on check creation, as the query logic involved in applying this validation was poorly optimised for the subsequent growth of those accounts.

Timeline

(times in UTC)

  • 11:16 - Operator inadvertently applies a configuration to all clients
  • 11:25 - Issue escalated to on-call engineer and investigation on the problem starts
  • 12:32 - Reverted configuration changes to all clients
  • 13:46 - Applied a configuration option to a final subset of clients

Remedies

  • Complete the removal of the deprecated code path and configuration option
  • Review our current configuration restore mechanisms, to ensure in the event of need this can be executed rapidly
Posted Aug 18, 2020 - 17:14 UTC

Resolved
All check creation through the API is currently working without any issue.

A detailed postmortem will follow once we've concluded our investigation.
Posted Aug 07, 2020 - 14:12 UTC
Monitoring
A fix has been implemented for this issue. We're observing timeouts and validation error rates returning to normal.

We're continuing to monitor and investigate root cause.
Posted Aug 07, 2020 - 12:43 UTC
Investigating
Some customers have reported timeouts on check creation through the API in our European region (api.onfido.com).

We are currently investigating this issue and will update when the impact is verified.
Posted Aug 07, 2020 - 12:22 UTC
This incident affected: Europe (onfido.com) (API).