Persistent volumes are a major primitive of Kubernetes and allow storage to be abstracted away from pods so that storage can be managed separately if required.

The problem is that the result is quite confusing to newbs and particularly once you include Powershell and Azure Kubernetes Service.

In my case, I was trying to create a static PV since one of my deployments was not working and I wanted to try and eliminate the elements that were causing it to break. When listing my persistent volumes, I saw the following:

NAME                                       CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS   CLAIM                                    STORAGECLASS             REASON   AGE
pvc-3cbde98e-e437-11e9-9e4b-46da0f26f479   8Gi        RWO            Retain           Bound    default/data-service-queue-rabbitmq-1    managed-premium-retain            5h4m
pvc-6cc6dab3-e437-11e9-9e4b-46da0f26f479   8Gi        RWO            Retain           Bound    default/data-service-queue-rabbitmq-2    managed-premium-retain            5h3m
pvc-7971ac5f-e454-11e9-9e4b-46da0f26f479   8Gi        RWO            Retain           Bound    default/data-service-queue2-rabbitmq-0   managed-premium-retain            95m
pvc-a672e77e-e454-11e9-9e4b-46da0f26f479   8Gi        RWO            Retain           Bound    default/data-service-queue2-rabbitmq-1   managed-premium-retain            93m
pvc-c4d6a59f-e435-11e9-9e4b-46da0f26f479   8Gi        RWO            Delete           Bound    default/data-service-queue-rabbitmq-0    default                           5h15m
pvc-cbe7fbed-e454-11e9-9e4b-46da0f26f479   8Gi        RWO            Retain           Bound    default/data-service-queue2-rabbitmq-2   managed-premium-retain            92m

Which was interesting since I had deleted my first attempt at Rabbit MQ hours ago and only expected 3 entries here.

This made sense when I considered that the storage class was made to retain the volume after the pods went away (since it was designed for re-use) although I was very confused as to why the pvs were listed as bound to services that no longer existed. I think this is part of the learning curve of Kubernetes and possibly a bug - there are many that seem to get reported on github :-(

Anyway, now that I realised I had dead PVs, I could just delete them right? Nope, the first problem was that although their names had pvc in them, they are NOT claims (pvcs) but volumes (pvs) that are created as a result of the claim - hmmm, not a great naming idea.

So anyway, once I realised they were pvs, I tried to delete one and it just hanged. After a few minutes, I found out there are other things that might be broken and people suggesting I override the finalizer for the pv to make it terminate! (Shudder). So I found the code that uses kubectl patch to set the finalizer to null and tried to run it.

kubectl patch pv pvc-3cbde98e-e437-11e9-9e4b-46da0f26f479 -p '{"metadata":{"finalizers":null}}'

The next error:

Error from server (BadRequest): invalid character 'm' looking for beginning of object key string

I assumed it was related to the json but I couldn't work out what was wrong: the same code was shown in various places and I thought maybe it was out of date until I suddenly realised that most people are probably using bash and not powershell and it clicked: I needed to escape the command quotes to make it work in powershell:

kubectl patch pv pvc-3cbde98e-e437-11e9-9e4b-46da0f26f479 -p '{\"metadata\":{\"finalizers\":null}}'

I thought my journey was over but I then realised that I have to run this patch AFTER I have initiated the delete. I then realised the pvcs were also not removed automatically and perhaps this is why the pvs (that I thought were pvcs) were not deleting but stuck in terminating - an error would have been more useful.

I then found a long discussion about why pvcs are not deleted and it was something to do with data being valuable and maybe wanting to re-attach them later but that sounds weird - I'm pretty sure you could keep the underlying pv open (retain) and re-attach using new pvcs but what do I know?

What was probably not helping was I was using a storage class I created for a shared disk when deploying rabbit mq which actually wants 3 separate disks. Maybe I should have used a disk set to delete on destroy of the pods which might have cleaned things up properly.

Who knows?