Learning from a failed deployment

This morning a deployment failed catastrophically. One of the scripts for upgrading the database caused several objects to be dropped unexpectedly.

We restored the database from backup, corrected the script, and repeated the deployment, which was successful. We now had to do a retrospective to learn what went wrong and how to avoid it in future.

We found that the database scripts generated by SSDT included statements to drop user objects. In this case, it deleted the user with db_owner role, which is used for deployment. This meant that subsequent statements could not be executed, and objects that had been dropped could not be created again.

The lapse in our process that allowed this to happen was us having too much confidence in the scripts that were generated. Nobody had verified that they did not contain any destructive statements.

The error had not happened on our development databases because our Windows accounts had sa role, and the security context allowed the scripts to execute even if the db_owner user was deleted. The lesson here was to stage-deploy under the same conditions in the development environment as in the production environment—a sensible approach that we ignored for convenience.

To avoid the error, we are changing our process to include a visual inspection of the database scripts before they are executed. We are also adding a canary database, which is a copy of the production database, on which the scripts can be tested as a final check.

Leave a comment

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.