Azure Key Vault works on one server but not another!
I have also seen this reported as it works locally but not when deployed. The problem was an API call with a JWT which returned 200 on staging but 401 in production with the same token.
Fortunately, the curse of the Developer machine was not present because some code worked fine on staging but not in production - two much cleaner environments.
The only difference there should be is that staging is a single server whereas production is 6 servers behind Cloudflare and a load balancer. All config is the same. A quick call from Postman (thank goodness it was an API issue!) direct to one of the production servers immediately removed Cloudflare and the load balancer as culprits.
You might not realise but logging is really important! We had Application Insights wired up but were not logging what was causing the 401 to occur. I also realised that both auth handlers returned the same name for "realm" in the www-authenticate header so I changed one just to make it obvious that we definitely failed in the JWT handler.
I also added a simple call to appInsights.TrackException() if the token fails validation for any reason. I wouldn't normally log this because it can generate noise but why not?
I then saw a very scary exception logged in App Insights for production:
Internal.Cryptography.CryptoThrowHelper+WindowsCryptographicException
The system cannot find the file specified.
Inner exception Internal.Cryptography.CryptoThrowHelper+WindowsCryptographicException handled at System.Threading.Tasks.Task.ThrowIfExceptional: at Internal.Cryptography.Pal.CertificatePal.FilterPFXStore
at Internal.Cryptography.Pal.CertificatePal.FromBlobOrFile
at System.Security.Cryptography.X509Certificates.X509Certificate..ctor
t System.Security.Cryptography.X509Certificates.X509Certificate2..ctor
at API.Core.Auth.JwtBearerTokenAuthenticationHandler+
Why so weird? Well why is it trying to read any file and why does it work in one environment and not another.
I hate stuff like this. A combination of internal implementation details and a question of what a file has to do with it coupled with the bland error message? Fortunately, I could easily verify that my code was definitely retrieiving the key from Azure and passing it to X509Certificate2 constructor as the only parameter so definitely a problem with the platform.
There were some similar errors logged online which related to deploying to Azure and certain permissions. This didn't seem relevant but we compared the App Pool identity on the staging and production systems and found, funnily enough, that staging (which was working) had "Load User Profile" set to false in the advanced app pool settings. This rang a bell because this was the fix for Azure deployments also.
So apparently, even when loading certificates from a blob, Windows connects to the local certificate store (for some reason!) but if it doesn't have permission, it throws an error implying a file could not be found and doesn't tell you what the file is or that it doesn't have permission, either of which might have helped. By not loading the user profile, you are running as the local machine which means you are allowed to access the local machine key store (something like that anyway!)
Fortunately, the curse of the Developer machine was not present because some code worked fine on staging but not in production - two much cleaner environments.
The only difference there should be is that staging is a single server whereas production is 6 servers behind Cloudflare and a load balancer. All config is the same. A quick call from Postman (thank goodness it was an API issue!) direct to one of the production servers immediately removed Cloudflare and the load balancer as culprits.
You might not realise but logging is really important! We had Application Insights wired up but were not logging what was causing the 401 to occur. I also realised that both auth handlers returned the same name for "realm" in the www-authenticate header so I changed one just to make it obvious that we definitely failed in the JWT handler.
I also added a simple call to appInsights.TrackException() if the token fails validation for any reason. I wouldn't normally log this because it can generate noise but why not?
I then saw a very scary exception logged in App Insights for production:
Internal.Cryptography.CryptoThrowHelper+WindowsCryptographicException
The system cannot find the file specified.
Inner exception Internal.Cryptography.CryptoThrowHelper+WindowsCryptographicException handled at System.Threading.Tasks.Task.ThrowIfExceptional: at Internal.Cryptography.Pal.CertificatePal.FilterPFXStore
at Internal.Cryptography.Pal.CertificatePal.FromBlobOrFile
at System.Security.Cryptography.X509Certificates.X509Certificate..ctor
t System.Security.Cryptography.X509Certificates.X509Certificate2..ctor
at API.Core.Auth.JwtBearerTokenAuthenticationHandler+
Why so weird? Well why is it trying to read any file and why does it work in one environment and not another.
I hate stuff like this. A combination of internal implementation details and a question of what a file has to do with it coupled with the bland error message? Fortunately, I could easily verify that my code was definitely retrieiving the key from Azure and passing it to X509Certificate2 constructor as the only parameter so definitely a problem with the platform.
There were some similar errors logged online which related to deploying to Azure and certain permissions. This didn't seem relevant but we compared the App Pool identity on the staging and production systems and found, funnily enough, that staging (which was working) had "Load User Profile" set to false in the advanced app pool settings. This rang a bell because this was the fix for Azure deployments also.
So apparently, even when loading certificates from a blob, Windows connects to the local certificate store (for some reason!) but if it doesn't have permission, it throws an error implying a file could not be found and doesn't tell you what the file is or that it doesn't have permission, either of which might have helped. By not loading the user profile, you are running as the local machine which means you are allowed to access the local machine key store (something like that anyway!)