Starting on the 8th November 2023 from 16:17 UTC until 17:01 UTC, a small subset (~0.4%) of our customers' end-users using our Facial Similarity: Video product will have felt a degradation of the Onfido service, particularly at video upload time. Majority of applicants was able to retry and successfully go through upon this second attempt. This problem was exclusive to our EU instance.
Issue was related with a recent feature for checking video integrity we had tested on increasing subsets of traffic, after which it was fully rolled out (EU, US, CA). We observed CPU usage spikes when using this feature to check video integrity, which impacted the performance of upload related functionality. After we turned off the feature, we experienced CPU usage lowering down and regular service performance was re-established.
We are going to revise the logic for checking video integrity and evaluate ways of performing the check that are more efficient and that don’t cause this sort of issues. (ETA: Q4 2023)
We will additionally revise the error contention mechanism we have in place (circuit breaking) and evaluate the need for it. In this particular case, some ripple-effect early-failures were reported due to upstream service for checking video integrity being deemed “down” when in fact it was suffering from transient failures. (ETA: Q4 2023)