On the 11th February 2025 around 14:12 UTC, a database change on a primary key index in the EU region impacted the creation and management of workflow runs via the Studio API and the SDK.
While performing a schema migration on a table to change a primary key, an index was inadvertently dropped, which affected the performance of some critical operations relying on it.
The incident lasted around 15 minutes, until the index for the previous primary key was re-introduced.
The rollout of this change failed to strictly follow our established internal change management procedures for database migrations.
13:35:35 progressive rollout of the schema migration started in US region
14:09:30 unsuccessful attempt to manually abort the rollout of the migration after an increase of P50 endpoint latency was observed in that region
14:12:00 progressive rollout of schema migration started executing it in the EU region
14:19:49: first related 500 API error was recorded due to database query timeouts
14:23:24: monitoring alarm triggers due to surge of 5XX HTTP errors in the Workflow API
14:30:58: incident was reported
14:32:00: index on the previous primary key started being re-created
14:35:29: API recovered
14:38:45: alarm recovered
14:50:47: incident closed