Increased error rate on Onfido API Europe region
Incident Report for Onfido
Postmortem

Summary

On February 17th 2021 from 13:22 to 13:38 UTC, we experienced intermittent periods of latency which prevented new checks from being correctly created and slowed down the requests for existing checks

Root Causes

A maintenance task on our production databases added contention when inserting record into a key database. This contention prevented any new checks from being correctly created and slowed down access to pre-existing checks. Once the effects of the task were reverted, the database returned to normal and checks could be created as normal. 

Timeline

(times in UTC)

13:22 - A database maintenance task executed 
13:24 - On-call alerted by our monitoring system
13:25 - On-call start reverting the changes introduced by the task
13:37 - All changes reverted and the service fully recovered
13:38 - Incident closed

Remedies

  • We are working to add process to make sure future tasks will not add any contention 
  • We are working to improve our database on-call run-book to speedup database issues resolution
Posted Mar 05, 2021 - 14:17 UTC

Resolved
This incident has been resolved. We have confirmed that the fix has solved the underlying error and delays no longer occur.
We will publish a Post Mortem in the next working days explaining what happened.

Sorry for any inconvenience this has caused.
Posted Feb 17, 2021 - 14:05 UTC
Monitoring
A fix has been implemented and we are monitoring the results.
Posted Feb 17, 2021 - 13:53 UTC
Identified
The issue has been identified. Current latency and error rate are almost back to normal values.

We will continue monitoring. We will update provide a new update in 10m.
Posted Feb 17, 2021 - 13:50 UTC
Investigating
We are seeing an increased error rate and latency increase across all endpoints of the onfido API in the European region.

We are investigating and will update in 5minutes.
Posted Feb 17, 2021 - 13:42 UTC
This incident affected: Europe (onfido.com) (API).