Containerising SQL Server Databases in Devops

What is devops about? Automation and code-based configuration so that you get repeatability, removing manual steps which create mistakes, speeding up the pipe from development to production. One area that people tend to leave until last is the database - I don't blame them!

This article is about SQL Server but will probably translate fairly closely to other databases

The Default Process

For most people, there is an existing database long before automation came along. You hopefully have a development version, possibly some integration/system testing versions and then the production instance. Things are developed on the development database and then some manual or semi-automated process takes place to move these changes into the other instances. For schemas this isn't too bad if your changes are not too marked but otherwise you need something like the Red Gate tools to do schema or data compare.

This might work OK but it does present a number of issues:

Keeping databases consistent (what if you forget to update a specific instance?)
not scalable, if your number of instances increases, the amount of work goes up significantly
Not locked into versions since you might not be able to update a production database until later on when a new application is deployed. This is probably done manually so someone needs eyes on everything to make sure nothing breaks.
Does not work well with containerised applications where you might be sharing a connection to a central database. Running tests at the same time are likely to break very easily.
Does not present a clean way to create a new database, perhaps with default data for e.g. a pen-test or performance test
Does not provide a centralised way of seeing who changed what, when and why.

Non-containerised Automation

The first step towards containerisation is building out the elements that you need for automation. Once this is done, containerising is easier. Most of these can be done in parallel with business-as-usual but at some point, you will need to decide how things will switch over to the new way.

The first step is to convert your database into a code-based project. In SQL Server world, you can use the SQL Server Data Tools extensions to Visual Studio to create a database project. You can then use the comparison tool to import all of the objects from your current database into the project. They become SQL files so in terms of source-control, they behave like any other code change for tracking purposes.

There is some setup required since you will need to reference dacpacs for master, msdb etc. otherwise any references in your project to things like logins etc. will not work. You can get these from the ssdt tools, you will need to search for them but probably best to copy them into your project so that any external changes won't break anything.

You can use the same compare tool to look at the schemas of the other database instances, if you want, to make sure they are all up to date. The compare tool can be used in both directions so you can update a target database from the database project if you want.

Once this is done, the first stage of automation is creating some kind of executable script (in the format of your choosing) to run SqlPackage.exe in order to publish the schema to target databases. SqlPackage is clever enough to apply changes but beware that you should test what happens if you make a breaking change (like dropping a table) so that you know whether you would get an error or whether the command you use will just drop it and lose data!

Once you have that script, you can tie it into your usual Devops automation tool so that when it is run against each environment, it targets that database instance using devops variables. If the database is tied to a single app, you could include it as part of that app's deployment so the changes are always deployed together.

Handling the Data

The basic project handles the schema easily enough but when it comes to data, it gets a bit trickier. In most cases, you cannot easily add data to an existing database that already has data unless you can script it and include guards to ensure the data is not already present. Adding data to an empty database, however, is much more straight-forward but then you need to think about what is a good source of clean data for this process and what are you going to do to allow developers to modify this data without it being a pain in the neck.

We had some SQL scripts initially but they had 1000s of statements in them and they were largely unusable. Imagine setting up an account on the dev environment for testing and then having to try and script it into SQL statements!

A better idea is to have two development databases. One is a bit of a free for all. It might contain temporary indexes, tables, tests etc. but is NOT considered the authority for data or schema. Instead, the schema is purely defined by the database project. If you need to add something, you can play around in the development database but to get your change to production, it has to go into the project and get reviewed etc. along with any other code reviews.

Since data needs modifying over time, the project should build and update the schema on a second database, let's call it Database_Template. This database is pointed to by a different application that is used purely to setup data that is needed for either automated tests or otherwise for new environments so it can include all the "template data" like countries or locales and optionally a set of user data that gives people something to play with as soon as it is created. The job that updates the schema on this database then exports the schema and data into a bacpac which becomes a build artifact.

Deploying New Databases

Since the bacpac contains schema and data, it is no use for production databases but it can be used for 2 things 1) the basis for new environment databases and 2) the basis for containerised plain databases that can be used for automated testing or local debugging.

When deploying new databases, this bacpac can be the basis for a new deployment job on your devops system that can point the bacpac to the relevant server/database name and create a new database using the automation variables to ensure it doesn't attempt to overwrite something that is already there (it will usually fail in this instance but not necessarily obviously!)

Containerising the Database for Testing

This is the last stage of automated deployments for databases. Containerising the database at this point adds something over the basic advantages of automation and that is an easy way to create multiple instances of a database, potentially on-demand.

Where is this useful?

Developer wants a database instance they can mess around with without worrying about changes being persisted or breaking anything else that someone else is using
To run (potentially multiple parallel) automated tests in an isolated database
To quickly create temporary environments in a microservice environment e.g. an environment for pen-testing that is only needed for a week

How do we do it?

Firstly, Microsoft are really pushing SQL Server on Linux in Docker, the images are 10 times smaller than the windows ones so in most cases, you will be using SQL Server Linux. For the developer edition, the base image is:

FROM mcr.microsoft.com/mssql/server:2019-latest AS final

You need to specify the SA password using environment variables and bypass the EULA prompt:

ENV ACCEPT_EULA="Y"
ENV SA_PASSWORD="P@ssW0rd!"

Although it comes with sqlcmd installed, at the time of my writing, the base package did not include sqlpackage. We install this by including some utility packages (this is an Ubuntu 16.04 based image).

USER root
RUN apt-get update \
&& apt-get install apt-utils unzip -y

Then we download and unzip sqlpackage for linux followed by making it executable:

RUN wget -progress=bar:force -q -O sqlpackage.zip https://go.microsoft.com/fwlink/?linkid=2143497 \
&& unzip -qq sqlpackage.zip -d /opt/sqlpackage \
&& chmod +x /opt/sqlpackage/sqlpackage

Once this is done, you should copy in your source bacpac file. My docker file is in the same directory as the database project so I copy the bacpac out of the release folder. Strictly speaking, this bacpac is put there in an automated build before the docker build runs so I could read it from the root directory instead.

COPY SSDatabase/bin/Release/*.bacpac /tmp/

The next step is a little complicated since we are doing something slightly unusual, we are running a process (sql server) and then waiting for it to start running by piping the process to grep for the message Service Broker manager has started.

Once this has happened, we create a database on localhost (the docker container), which is needed before sqlpackage will import a bacpac. It will then import the bacpac, remove the source bacpacs to save space and then kill sql server.

I don't really understand the syntax but all of this needs to be a single command.

RUN ( /opt/mssql/bin/sqlservr & ) | grep -q "Service Broker manager has started" \
    && sleep 10 \
    && /opt/mssql-tools/bin/sqlcmd -S localhost -U sa -P $SA_PASSWORD -Q "CREATE DATABASE SPC_Test; ALTER DATABASE SPC_Test SET RECOVERY SIMPLE;" -X1 -j -r1 \
    && /opt/sqlpackage/sqlpackage /a:Import /tsn:localhost /tdn:SPC_Test /tu:sa /tp:$SA_PASSWORD /sf:/tmp/FuncTest.bacpac \
   && rm /tmp/*.bacpac && pkill sqlservr

Once this has finished, we have an image that contains SQL Server with the database we just restored. We don't really care about the password security because this will only be used locally.

Using with Docker Compose

If you want to use this database image locally, for example to point a local service to the containerised database, the docker compose section for the database is easy:

services:
database:
image: registry.azurecr.io/your.database.image:latest

If we are running automated tests, we need to add a step to the app's dockerfile to make sure it waits for the database to start up. If it is just a normal web app, you don't need to, just note that it might take a few seconds (usually about 10) before the database is available:

ADD https://github.com/ufoscout/docker-compose-wait/releases/download/2.5.0/wait /wait
RUN /bin/bash -c 'ls -la /wait; chmod +x /wait; ls -la /wait'
CMD /wait && dotnet test --logger:trx --results-directory /var/temp /app/build/Test.Integration.dll

We tell it what to wait for with an environment variable in the docker compose file. You will also notice how we also use an environment variable to override the connection string in appsettings to point to the docker compose service:

version: '3.4'

services:
integration_test:
    environment:
      - WAIT_HOSTS=database:1433
      - ConnectionStrings__conn=Server=database;Database=SPC_Test;User Id=sa;Password=P@ssW0rd!;
    volumes:
        - ./TestResults:/var/temp

In the local version, you probably want to add a build section and get it to automatically build what you are running. In the build server version, you might want to build the project and push it first and then get docker compose to run the image. I had to do it this way because firstly, in Team City, only the Docker build step allows you to specify the target OS, whereas the docker compose one doesn't. Secondly, the build via docker compose would not work even though the docker build did!

Tidying Up

The last job is working out how to tie it all together. What to trigger automatically, and what to trigger manually. Should your devops lifecycle include the additional environments or do you want a separate project with separate lifecycles that could perhaps be used to send a branch build to a test environment separately than the master branch?