Fast Docker builds for dotnet core

If you are serious about using Docker, one thing that you must know is that Docker tries to reduce the build times of images by using a cached file system, this is to make up for the fact that you are doing more than building an app, you are virtually building an entire OS, even if it is cut down.

Without docker, a CI build would probably 1) Run dotnet restore 2) Run dotnet build in debug 3) Run some tests and 4) Publish in release.

With Docker you have to 1) Use a base build image that you might or might not have downloaded and which you might or might not have the latest version of 2) Copy one or more csprojs into the container 3) run dotnet restore on these csprojs 4) Copy the remainder of the build files 5) dotnet build the project 6) dotnet restore some test projects 7) build the test projects 8) Run the test projects 9) Publish the app 10) copy the published app into a smaller runtime image

In other words, lots more steps. Even if these are relatively quick, in Docker there is an overhead with every instruction to create a layer with the command on it, add it to the cache for potential later usage and then remove the intermediate container. If these steps all took only 1 second then the build would take at least 10 seconds, but of course, the build and tests are likely to take longer to run.

The basic principle of utilising the cache is two things:

1) Try and keep often-changed content as late in the build as possible to be able to re-use earlier stage caches

2) Try and reduce what you copy into the container (and context actually!) both to speed up the process in terms of IO work but also to reduce the chance of an unimportant change busting Docker's cache and making the builds take ages.

I was looking at this today in one of my images and trying to work out why running the build again with no changes on the CI server was still taking 1m30 seconds, whereas when I ran it locally with no changes, it took about a second.

The first thing to note is that the CI server runs tests and I wasn't doing that locally so there was an entire stage that was skipped locally but there was still a large difference.

The first thing you need to do is look through your Docker file and try and optimise it from basic principles.

For example, you might notice that the default Microsoft dockerfile for dotnet core uses the runtime image as the first step even though it doesn't use it until the end. Why? Because the runtime is likely to change less often than the SDK so it goes earlier.

Secondly, you might wonder why it copies the solution or csproj first and then copies everything else afterwards, why not just copy everything to begin with? Simple. The csproj will change less than the code files so if you copy this first and run dotnet restore against it, you only need to repeat that step if you ever change the csproj. Restore is quite slow so all of that step can be grabbed from cache next time!

Once you have done what you think you need to, you then need to analyze the Docker build log for your slow build:

Sending build context to Docker daemon  1.753MB
  Step 1/21 : FROM mcr.microsoft.com/dotnet/core/aspnet:3.1-bionic AS base
  3.1-bionic: Pulling from dotnet/core/aspnet
  Digest: sha256:b65d6b0ec41e8acb16616089884e35370180d382947c5be6fad695606df2dbac
  Status: Image is up to date for mcr.microsoft.com/dotnet/core/aspnet:3.1-bionic
   ---> 8306c0c5fe66
  Step 2/21 : WORKDIR /app
   ---> Using cache
   ---> caab0cb5431b
  Step 3/21 : EXPOSE 80
   ---> Using cache
   ---> 6c45dee683be
  Step 4/21 : FROM mcr.microsoft.com/dotnet/core/sdk:3.1-bionic AS build
  3.1-bionic: Pulling from dotnet/core/sdk
  Digest: sha256:99b8a3bde8e419e7b7673bdb031345abac4926d776da96f7ef44b6ac7012033b
  Status: Image is up to date for mcr.microsoft.com/dotnet/core/sdk:3.1-bionic
   ---> 60f4cb98a1ac
  Step 5/21 : WORKDIR /src
   ---> Using cache
   ---> 7383d7b2aee3
  Step 6/21 : COPY ["Microservices.System/Microservices.System.csproj", "Microservices.System/"]
   ---> be89cfdbea62
  Step 7/21 : COPY ["SharedLibs/Microservices.Security/Microservices.Shared.Security.csproj", "SharedLibs/Microservices.Security/"]
   ---> 06c77bdef311
  Step 8/21 : COPY ["SharedLibs/Microservices.Shared/Microservices.Shared.csproj", "SharedLibs/Microservices.Shared/"]
   ---> e38881e4d0a0

The first thing to note is that the first part is fine. It has pulled the image metadata and it realises that the image is up to date. You can then see that right up until we reach the first copy command, we are using cache (---> Using cache). However, after we copy the csproj, something has changed and Docker has to rebuild all of the stages after this.

The cache is based on the command line in your dockerfile except for ADD and COPY in which case it is also based on a checksum of the contents of the files. This was confusing because although I hadn't changed the csproj, Docker decided it had changed.

After panicing about what Team City might be doing to the files which was breaking Docker, I then realised I was changing the csproj, I was injecting the build version into it on the command line before running Docker. Ah!

So back to principle 1, how can I move the changed file (which changes for every build) as late as possible in the Dockerfile. Easy. I don't need to build and test the injected version number, it is only for deployment. In other words, I can inject the version number into a temporary file and then don't copy it until the final publish stage when I can copy it over the top of the normal csproj! Magic! That unblocked the build a bit more but it was still cache busting a little later in the build:

15:54:53
  Step 12/21 : COPY . .
15:54:53
   ---> 96657756cc7c

You get COPY . . by default in the dotnet core dockerfile but every time this was called, the cache broke again. Why? Eventually I realised that because I was writing this versioned project file into the root file system, this changed file got copied every time and because it contained the build number, it was different for every build. Cache was broken again at this point. All I had to do was to change COPY . . to some more specific copies. Yes, this is more of a pain but you can copy entire directories with COPY somepath/* destpath so it isn't too hard.

After these small tweaks, I managed to get the 2m44 build down to 44 seconds of which 7s are the unit tests, 15s is the publish step, which always has to run, a 5 second pause to allow Azure to register the new versioned container and then 6 seconds for Octopus to create a new deployment. The build for test step is now an impressive 2 seconds when using cache and the rest of the time is just Team City thinking about stuff!

It is worth making sure you understand where your Docker build times are going. If you get it wrong, it will be much slower than a normal build. If you get it right then it can be much faster since only changes that would affect the build will cause a rebuild!