TL;DR It was my async deadlock and was nothing to do with DocumentDB!

I have been trying to use DocumentDB for session. Why? Although it is not the recommended "redis" way, it is resilient, supposedly fast and will save us a packet on the current 3 cache worker roles we have to support.

So I wanted to see whether it was going to work and for reasons I could not understand, the Unit Tests in my NoSQL library all ran OK but doing the SAME thing in my web app and although ReadDocumentAsync worked fine, CreateDocumentAsync would hang. It added the document OK but just hanged and fortunately, when I searched for something more generic than "DocumentDB is hanging", I chanced upon a few articles that talk about the dreaded async deadlock in .Net.

Hopefully, you all know what async is. Some magic glue that Microsoft invented to increase performance in .Net applications, particularly when they are waiting for external things to happen. It is not really the same thing as multi-threading and also not quite what people think of when we talk about asynchronous coding. It can therefore be quite confusing and this confusion is where the deadlock comes from.

What does async do? When you call an async method, it returns a task which is a kind of "handle" on the task and which allows you to carry on and do something else if you want OR you might just want to wait there for it to finish. Let us first look at the correct way to await an async method:

var returnValue = await MyMethodAsync();

NOW, this is the thing. Under the covers, once the async method is called, the thread calling this method will be released back to the thread pool so it can be used for something else. Once the task has finished, the framework will wait for the SAME thread to become available, at which point, control will continue with the same thread. Not surprising since you have a stack that needs to be retrieved once you restart.

You could alternatively get a task and await later on:

var task = MyMethodAsync();
var returnValue = await task;

So what's the problem? To use the "await" keyword, you must be in an async method and if you are not careful, you end up with all kinds of async methods starting at the lowest level and working their way all the way to the top. This can be confusing and seems to be over-the-top, especially when ReSharper keeps telling you that you have an async method that might not be using await!

So there are a couple of other ways to call async methods. One of them is called Result (which returns the value) and the other Wait(), is used if there is no return value. These allow you to call an async method without using await and implicitly wait on the call. These are synchronous calls, in other words, the thread blocks, it does NOT get released like it would if you called await.

Can you see the problem yet? IF you call the async method on the SAME thread that you then call Result or Wait() on, you will probably deadlock because once the async task has finished, it will wait to re-acquire the previous thread but it can't because the thread is blocked on the call to Result/Wait()

So why was this working in my Unit Tests? Well note my use of the word "Probably". Obviously if you do not wait for the task to complete, you will not deadlock but that is likely to be wrong since you will probably need to handle errors/return values from async methods but also, if the async call is fast enough, it might finish before the task is returned, in which case, it will have already re-acquired the original thread and when that thread then calls Wait() or Result, the data is already there.

So you can use async tasks and await to avoid this problem but there is also another clever trick, certainly in newer versions of the .net framework and that is to invoke your async task on another thread, not on the one you are calling your method with. It is as simple as this:

var task = Task.Run(() => myMethodAsync());

which involves the method on a thread from the thread pool. When your calling thread then waits and blocks using Wait() or Result, the async task will NOT need to wait for your thread, it will re-acquire the one from the threadpool, finish and signal your waiting thread to allow it to continue!

It is important to carefully consider use of async. For instance, you want lots of spare threads when you are waiting for long-running tasks but at the same time, letting more people in when you are already choked in the backend might not improve performance but make everything much slower. You should consider the mix of functionality your site is providing (if everyone does the same thing, you might not save anything) and you should generally not async calls to the database unless it is replicated since the database is usually the slowest point on the system and letting the web servers deluge it with even more threads will not make things faster, quite the opposite!

I'm still learning though.