Increase of withdrawn device intelligence reports

Incident Report for Onfido

Postmortem

Summary

All Device Intelligence reports created on the 23rd of September 2025 between 06:59 and 08:06 UTC were withdrawn. Moreover, as a consequence the corresponding Studio Tasks did not complete during the incident.

Once the root cause was identified and fixed, all withdrawn reports were re-processed and the results were delivered through Webhooks or can be fetched through the Public API.

Root Causes

A faulty deployment in the service that processes the results for this report/task type was released at 06:59 UTC.

Timeline

06:59 UTC: The service that processes device intelligence reports starts to be deployed into production in all regions.

07:05 UTC: Deployment finishes with success

07:08 UTC: First responders are notified about a slight increase in error rate in Studio tasks

07:21 UTC: First responder starts looking into the issue, without clear indication the deployment was the cause of the error rate increase

07:50 UTC: We understand that 100% of device intelligence reports were withdrawn, showing a larger issue than just an error rate increase in Studio

08:06 UTC: Roll back the service that is responsible for processing this type of report completed and error rate recovers

09:05 UTC: Start preparing to reprocess withdrawn reports

09:44 UTC: Report reprocessing started

10:37 UTC: Report reprocessing finished

Remedies

  • Improve the alerting in the service for this report type
  • Add additional e2e test coverage for the service
Posted Oct 06, 2025 - 08:44 UTC

Resolved

All withdrawn reports were re-run.

In progress workflow runs should be resumed now.
Posted Sep 23, 2025 - 10:40 UTC

Monitoring

We have rolled back a faulty deployment that caused device intelligence reports to be withdrawn. We are currently re-running them.

Studio workflow runs that are in progress will be resumed once the device intelligence reports are re-run.

This disruption lasted for 1 hour between 07:04 UTC and 08:06 UTC.
Posted Sep 23, 2025 - 10:08 UTC
This incident affected: USA (us.onfido.com) (Device Intelligence) and Europe (onfido.com) (Device Intelligence).