<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" ><generator uri="https://jekyllrb.com/" version="4.2.2">Jekyll</generator><link href="https://blog.lukebriner.net/feed.xml" rel="self" type="application/atom+xml" /><link href="https://blog.lukebriner.net/" rel="alternate" type="text/html" /><updated>2026-01-21T15:35:18+00:00</updated><id>https://blog.lukebriner.net/feed.xml</id><title type="html">Computer Student</title><subtitle>I am mainly a dotnet developer who now spends a lot of time in DevOps and microservices. Conference speaker and general nice-guy!</subtitle><entry><title type="html">Setting up Amazon SES with SNS notifications</title><link href="https://blog.lukebriner.net/2026/01/21/ses-with-sns-notifications.html" rel="alternate" type="text/html" title="Setting up Amazon SES with SNS notifications" /><published>2026-01-21T15:22:01+00:00</published><updated>2026-01-21T15:22:01+00:00</updated><id>https://blog.lukebriner.net/2026/01/21/ses-with-sns-notifications</id><content type="html" xml:base="https://blog.lukebriner.net/2026/01/21/ses-with-sns-notifications.html">&lt;h2 id=&quot;amazon-ses&quot;&gt;Amazon SES&lt;/h2&gt;
&lt;p&gt;When you are sending lots of emails, managing reputation is an ongoing and almost unwinnable game. Despite Microsoft having reputation tools that show our servers as delivery very low-levels of SPAM, they “heuristics” will still mark servers as “yellow” and they are often blocked for random amounts of time. To be honest, life is too short and MS are completely unwilling to engage on what they are doing and why other than offering a “whitepaper” from 2007! A multi-billion dollar company cannot get one of their team to spend a couple of hours updating it.&lt;/p&gt;

&lt;p&gt;So we decide to use Amazon SES for a number of reasons:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;Pricing is very reasonable&lt;/li&gt;
  &lt;li&gt;They have some really great features like “tenants” to separate customer reputations from each other&lt;/li&gt;
  &lt;li&gt;Lots of functionality in the form of APIs&lt;/li&gt;
  &lt;li&gt;Automatic notifications of issues like bounces or “sending paused” which we can hook into our own systems&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;It is the last of these that took up too much of my time today!&lt;/p&gt;

&lt;h2 id=&quot;amazon-sns&quot;&gt;Amazon SNS&lt;/h2&gt;
&lt;p&gt;Simple notification service. Like most AWS products it is not simple but it is reasonably priced, especially for webhooks, which let’s be honest are not expensive. But it does take some fiddling. It defaults the payload to text/plain, which makes sense for plain notifications where the message is just text but not so much for SES notifications which are actually JSON blobs in the text property.&lt;/p&gt;

&lt;p&gt;The problem was, I couldn’t get SES to send notifications at all. I could manually publish a message, which was sent fine. I looked through all the docs I could find, even copied an example from another region that worked but mine still didn’t.&lt;/p&gt;

&lt;p&gt;The first thing I noticed was that I had the default access policy which might have worked but I suspect not. Even though my colleague said he didn’t touch his, mine was set to the default whereas his wasn’t. This is documented (https://docs.aws.amazon.com/ses/latest/dg/configure-sns-notifications.html) but I wonder if you create the topic from SES, it might do that for you whereas if you do what I did and create the topic up-front, you have to do this bit (which is fine).&lt;/p&gt;

&lt;p&gt;The second part I had got wrong (and was different from my colleague hence why his was working) is that it is NOT enough to link the configuration set to the tenant and just specify the tenant when you said. This sends the email OK because the configuration set is linked but it does NOT generate the notifications unless you also specify the configuration set. For example in the API call:&lt;/p&gt;
&lt;div class=&quot;language-c# highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;kt&quot;&gt;var&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;rawRequest&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;SendEmailRequest&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;FromEmailAddress&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;from&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;ToString&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;true&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;),&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;Destination&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Destination&lt;/span&gt;
    &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
        &lt;span class=&quot;n&quot;&gt;ToAddresses&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;List&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;string&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;request&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;MailTo&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;TrimEmailAddress&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;},&lt;/span&gt;
    &lt;span class=&quot;p&quot;&gt;},&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;Content&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;EmailContent&lt;/span&gt;
    &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
        &lt;span class=&quot;n&quot;&gt;Simple&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Amazon&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;SimpleEmailV2&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Model&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Message&lt;/span&gt;
        &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
            &lt;span class=&quot;n&quot;&gt;Subject&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Content&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Data&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;request&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Subject&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;
        &lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;
    &lt;span class=&quot;p&quot;&gt;},&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;TenantName&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;$&quot;customer-&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;actualParentId&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;ConfigurationSetName&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&quot;Default&quot;&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;};&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;The problem with this, apart from the fact that there are no errors and the emails are sent! is that you need to know, in-advance, which configuration set name to use for which tenant. This is OK if you only have 1 but if you want to specify different delivery policy, different reputation or validation options etc. you will need to create more than 1 and then keep a record of these at your sending end. Again, not the end of the world but fails the principle of least-surprise!&lt;/p&gt;</content><author><name>Luke Briner</name></author><summary type="html">Amazon SES When you are sending lots of emails, managing reputation is an ongoing and almost unwinnable game. Despite Microsoft having reputation tools that show our servers as delivery very low-levels of SPAM, they “heuristics” will still mark servers as “yellow” and they are often blocked for random amounts of time. To be honest, life is too short and MS are completely unwilling to engage on what they are doing and why other than offering a “whitepaper” from 2007! A multi-billion dollar company cannot get one of their team to spend a couple of hours updating it.</summary></entry><entry><title type="html">OPNSense error The backup firewall is not accessible (check user credentials)</title><link href="https://blog.lukebriner.net/2026/01/12/opnsense-hs-error.html" rel="alternate" type="text/html" title="OPNSense error The backup firewall is not accessible (check user credentials)" /><published>2026-01-12T20:46:01+00:00</published><updated>2026-01-12T20:46:01+00:00</updated><id>https://blog.lukebriner.net/2026/01/12/opnsense-hs-error</id><content type="html" xml:base="https://blog.lukebriner.net/2026/01/12/opnsense-hs-error.html">&lt;h2 id=&quot;setting-up-opnsense-ha&quot;&gt;Setting up OPNSense HA&lt;/h2&gt;
&lt;p&gt;I watched a video and it “just worked” but on my system, I just kept getting “The backup firewall is not accessible (check user credentials)” from the primary end.&lt;/p&gt;

&lt;p&gt;Obviously, if you have the firewall locked down, you need to make sure that the firewall has ALLOW rules for things like pfsense and http/s but if you like, you can allow everything to make it easier.&lt;/p&gt;

&lt;p&gt;Still no dice!&lt;/p&gt;

&lt;p&gt;The problem eventually turned out to be that the MTU setting for the interfaces was NOT picked up from the virtual NICs attached to it so any large packets e.g. https handshake were being dropped by the network causing the misleading error message.&lt;/p&gt;

&lt;p&gt;All I had to do was go into the interfaces and set the MTU manually but it was annoying that this was another thing that isn’t documented anywhere because I guess it isn’t OPNSense’s problem and to be fair to them, the best they could do is say, “don’t forget to check MTU if you are on a vLan” but anyway….&lt;/p&gt;</content><author><name>Luke Briner</name></author><summary type="html">Setting up OPNSense HA I watched a video and it “just worked” but on my system, I just kept getting “The backup firewall is not accessible (check user credentials)” from the primary end.</summary></entry><entry><title type="html">Being careful with AI and ChatGPT</title><link href="https://blog.lukebriner.net/2026/01/12/being-careful-of-ai.html" rel="alternate" type="text/html" title="Being careful with AI and ChatGPT" /><published>2026-01-12T20:03:01+00:00</published><updated>2026-01-12T20:03:01+00:00</updated><id>https://blog.lukebriner.net/2026/01/12/being-careful-of-ai</id><content type="html" xml:base="https://blog.lukebriner.net/2026/01/12/being-careful-of-ai.html">&lt;p&gt;Happy New Year everyone!&lt;/p&gt;

&lt;h2 id=&quot;ai-is-definitely-here-to-stay&quot;&gt;AI is definitely here to stay&lt;/h2&gt;
&lt;p&gt;Let’s get something out of the way, AI is not going anywhere! For the people who are still complaining about “hallucinations” and what it is doing to jobs, I’m sorry but you just need to get over it, it is too useful and like the wheel, the engine and electricity, the worms don’t go back into the jar.&lt;/p&gt;

&lt;p&gt;That said we need to understand better how to use AI well for coding/infrastructure assitance otherwise we might go to an early grave because there are definitely some major frustrations for me and I found this out recently with a foray into Proxmox + OPNSense + Ceph + Talos Kubernetes + Hetzner!&lt;/p&gt;

&lt;p&gt;Yes: lots of new things to learn and lots of knobs to twiddle but it should have been simple enough and who better to turn to than my old friend ChatGPT?&lt;/p&gt;

&lt;p&gt;I will use the term AI when I mean ChatGPT specifically, “just because”&lt;/p&gt;

&lt;h2 id=&quot;rtfm&quot;&gt;RTFM&lt;/h2&gt;
&lt;p&gt;The truth is that most documentation for software/systems etc. is dreadful. The good stuff is usually just about good &lt;em&gt;enough&lt;/em&gt; but some of it is patchy; some of it out-of-date; some definitely leaves out a lot of the important details (e.g. it has a Quickstart and nothing much else) and as soon as you hit an unexpected hurdle, you are immediately screwed unless people on the forums are helpful. The fact is, I feel bad asking what seems to be a noob question that has probably been asked many times before but no-one on the project is keen enough on the docs to make sure that all “stupid questions” get rolled into the docs.&lt;/p&gt;

&lt;h2 id=&quot;back-to-chatgpt&quot;&gt;Back to ChatGPT&lt;/h2&gt;
&lt;p&gt;So instead of trusting that the docs have my back, I suspect like a lot of people, I turn to ChatGPT to summarise best-practice and most importantly: step-by-step instructions.&lt;/p&gt;

&lt;p&gt;Like a lot of you, I understand a fair amount about hardware and software, about network layers, about virtual networking, firewalls and whatever else but that doesn’t mean I want to just stand up 3 Proxmox Nodes and try and “just get it to work”, I want to make sure that I get it production ready.&lt;/p&gt;

&lt;p&gt;YouTube videos are &lt;em&gt;almost&lt;/em&gt; always produced by hobbyists who say things like “these are just virtual servers” or “I will only use one node but you might use more”, in other words, they are not useful for I actually need in production. How to make sure I don’t get locked out by the firewall, how to make sure I use the right settings for my VMs and whatever so back to ChatGPT.&lt;/p&gt;

&lt;h2 id=&quot;problem-1-you-dont-have-enough-context&quot;&gt;Problem 1: You don’t have enough context&lt;/h2&gt;
&lt;p&gt;When you ask ChatGPT questions, there is a hidden System Prompt which is added to your question and which says things like, “You are a helpful AI answering questions…you must not be rude…you must answer confidently in a kind tone…”, which all forms the context. You might wonder why this isn’t just hard-coded into the model and the answer is probably that it would be too complicated and would make your model a bit limited. Anyway, when you ask something like “What do I need to know about production Proxmox clusters?” you might or might not notice that you are not giving enough context. If you asked a human that question, they are likely to ask, “What do you mean specifically?” but ChatGPT doesn’t do that because it doesn’t understand the context, it doesn’t understand Proxmox or anything else, it just infers the most statistically likely response and unless you give it almost nothing, it will happily churn out a response that might or might not be very helpful.&lt;/p&gt;

&lt;p&gt;Of course, if you know it isn’t helpful, you are likely to come back with something more specific: “I meant specifically how many nodes I need and are there any default settings that need changing for production”.&lt;/p&gt;

&lt;p&gt;The catch is that when you don’t realise that you haven’t given enough context and don’t know if the answer is good or not, you fall down the rabbit hole and can end up borking your cluster (yes I did and had to start again!).&lt;/p&gt;

&lt;h2 id=&quot;problem-2-ai-doesnt-distinguish-between-versions-very-well&quot;&gt;Problem 2: AI doesn’t distinguish between versions very well&lt;/h2&gt;
&lt;p&gt;Unless you explicitly tell it the versions of software you are using and sometimes even if you do, the answer you get back might not be correct. Again, if you notice it, you can reply that it doesn’t sound right and it might work out that you are on another version but again, if you don’t notice, it might be confusing at best and at worst lead you down the rabbit hole.&lt;/p&gt;

&lt;p&gt;The thinking mode is better, of course, since it will search the web if it knows that the score of a response is low, but it does, of course, rely on online documentation to build a sensible response and if the docs are not great, the response from AI might also be…not great. It’s horrible being told something isn’t possible, attempting some gnarly workarounds only to find out later it is possible so be careful with that and tell it what versions you are using.&lt;/p&gt;

&lt;h2 id=&quot;problem-3-it-doesnt-remember-everything-in-the-conversation&quot;&gt;Problem 3: It doesn’t remember everything in the conversation&lt;/h2&gt;
&lt;p&gt;If you are talking to a person about Proxmox and you carry on for a while without mentioning it, the person will remember that you are still talking about Proxmox. ChatGPT doesn’t because it only has a certain sized context window, which although big, is not massive and for some of these long conversations, it can lose important context and then your answer about e.g. “creating a VM” becomes generic and not Proxmox-specific.&lt;/p&gt;

&lt;p&gt;I try and get in the habit of starting new conversations regularly and starting from a clean prompt to summarise where I am since most of the previous context will become irrelevant.&lt;/p&gt;

&lt;h2 id=&quot;problem-4-it-cant-tell-the-difference-between-easy-and-correct&quot;&gt;Problem 4: It can’t tell the difference between “easy” and “correct”&lt;/h2&gt;
&lt;p&gt;I was asking how to ssh into an OPNSense VM since it didn’t seem to work. ChatGPT sent me down a hole about enabling and starting the sshd service (which didn’t work) but each rebuttal digs a deeper hole with more esoteric and potentially harmful instructions like adding symlinks, finding the executable, creating a service file etc. all of which &lt;em&gt;might&lt;/em&gt; have been correct, but fortunately, I spotted the seemingly over-complicated answer, did a Google search instead and found out there is an option in the GUI to enable SSH. Dead quick and simple but for whatever reason, ChatGPT didn’t know.&lt;/p&gt;

&lt;p&gt;Another scenario that ended up ruining my Ceph cluster and making me fear that I was going to need to install Proxmox involved over an hour of networking nonsense when the underlying issue was that a vLan had the wrong IP in it and the firewall didn’t include a new subnet I was using. An experience engineer would have asked some better questions about “did it ever work” etc. and then is likely to have said, “try the firewall first”. ChatGPT started coaching me on multicast floods and route tables, which was quite interesting but it felt like it should have been easier to get to the truth more quickly.&lt;/p&gt;

&lt;h2 id=&quot;what-i-would-prefer-instead&quot;&gt;What I would prefer instead&lt;/h2&gt;
&lt;p&gt;I would LOVE it if people who maintain open-source software created AI powered help documentation. Rook io for example, has a load of helper scripts but no obvious route from the home page to, “This is exactly what you need to do to setup a rook cluster with external Ceph”. This was another thing that AI made into lots of manual steps, when all I needed was to run export-cluster.py and import-cluster.py pretty much! The Google search AI was much better and gave me about 5 instructions compared to the 30 that I was given by ChatGPT (and which didn’t work anyway)&lt;/p&gt;

&lt;p&gt;Lastly, a big shout-out to the amazing people who build and maintain tools like ss, ip, grep and others, which are so powerful for debugging (once you learn the magic switches!) and mean that, at least on Linux, debugging things like ARP and route tables is childs play!&lt;/p&gt;

&lt;p&gt;So more on the Proxmox stuff later. I am still trying to get OPNSense HA working (it semi-works :-) then I might knock up some more Production-quality videos about setting everything up and the pains I endured in the process!&lt;/p&gt;</content><author><name>Luke Briner</name></author><summary type="html">Happy New Year everyone!</summary></entry><entry><title type="html">What the Cloudflare Outage of November 18th 2025 shows us</title><link href="https://blog.lukebriner.net/2025/11/19/what-cloudflare-outage-shows-us.html" rel="alternate" type="text/html" title="What the Cloudflare Outage of November 18th 2025 shows us" /><published>2025-11-19T10:14:01+00:00</published><updated>2025-11-19T10:14:01+00:00</updated><id>https://blog.lukebriner.net/2025/11/19/what-cloudflare-outage-shows-us</id><content type="html" xml:base="https://blog.lukebriner.net/2025/11/19/what-cloudflare-outage-shows-us.html">&lt;h2 id=&quot;what-happened-at-cloudflare-on-novemeber-18th&quot;&gt;What Happened at Cloudflare on Novemeber 18th?&lt;/h2&gt;
&lt;p&gt;Cloudflare, an international network/cloud-based web application proxy/firewall/DDoS protector that protects many 1000s of large websites globally became unavailable for over 3 hours, while the long-tail of recovery took another 3!&lt;/p&gt;

&lt;p&gt;This was Cloudflare’s longest outage since 2019. Since Cloudflare protects many large and famous websites, all of these were unavailable for most or all of the window from 11:20 to roughly 17:10, this was &lt;a href=&quot;https://www.bbc.co.uk/news/articles/c629pny4gl7o&quot;&gt;national news&lt;/a&gt; in most countries.&lt;/p&gt;

&lt;p&gt;What actually happened? As usual, a chain of events.&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;A change was made (deliberately) to a set of database permissions&lt;/li&gt;
  &lt;li&gt;The effect of this, unknowingly, was to increase the size of a generated configuration file by roughly double&lt;/li&gt;
  &lt;li&gt;This file was pushed out over a number of minutes to the systems that use the file&lt;/li&gt;
  &lt;li&gt;These systems are deliberately memory constrained and the file was too large to open so they would crash since that scenario was not expected/handled cleanly&lt;/li&gt;
  &lt;li&gt;It was not clear what was causing the problem partly due to a) recent DDoS attacks meaning this might be another one b) The staggered deployment meant that some systems were OK and others were not c) The Cloudflare status page, which was independent of all of this was having unrelated issues making it look even more like a DDoS attack, which led to the large time to diagnose&lt;/li&gt;
  &lt;li&gt;Once the problem was identified, an older/smaller file was manually pushed into the deployment queue and a number of systems had to be restarted&lt;/li&gt;
  &lt;li&gt;As with all large systems, backpressure builds up from batch processes that are sitting retrying to access things so even after the system is up again, it then has a deluge of work to do, whic often reduces performance for a few hours as things catch up.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;To make things worse, customers would not usually be able to login to the Cloudflare Dashboard to make any changes e.g. to DNS to workaround the problem, since the Turnstile “captcha” on the login page was taken out by the breakage! Ouch.&lt;/p&gt;

&lt;p&gt;Most of us sat there weeping, hoping it would all be fixed soon.&lt;/p&gt;

&lt;p&gt;Cloudflare, to their credit, published a full and transparent post-mortem on the incident &lt;a href=&quot;https://blog.cloudflare.com/18-november-2025-outage/&quot;&gt;here&lt;/a&gt;&lt;/p&gt;

&lt;h2 id=&quot;what-didnt-go-wrong&quot;&gt;What &lt;em&gt;didn’t&lt;/em&gt; go wrong?&lt;/h2&gt;
&lt;p&gt;It’s very easy for those of us who don’t manage critical infrastructure of this size and complexity to come out with, “they obviously should have done….”, which is very patronising to a group of very talented staff. I read an article once about them finding a bug in a CPU! who has ever done that?&lt;/p&gt;

&lt;p&gt;Anyway, let us knock down a few of the trite accusations levelled at CLoudFlare:&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;How does a company with this much money/staff/skill make an error like this? Answer: None of those things prevent mistakes/bugs/issues, they just reduce their chance. The complexity of their system already magnifies their risks way beyond what most of us are managing.&lt;/li&gt;
  &lt;li&gt;My system has never gone down for this long! Answer: Congratulations. If your system is as complex as Cloudflare’s go and get a job there, but I suspect it isn’t. Here at SmartSurvey, we have traditionally had amazing uptime but guess what? We are orders of magnitude less complex than Cloudflare.&lt;/li&gt;
  &lt;li&gt;Why didn’t they update the status page sooner? Answer: I don’t know how long it took, but I don’t remember it being very long. Also, there were other issues with the status page that wouldn’t have helped and those people who usually update it might well have been fighting other fires that started. Another problem with large businesses is you get so many reports of “this is not working”, it does actually take time to confirm the problem before you update the status page and the tickets themselves do not get looked at instantly.&lt;/li&gt;
  &lt;li&gt;How can a system as important as this go down for as long as this? Answer: This is a non-sequitur. A system being important doesn’t, in itself, make that system more reliable or make it any easier to make it reliable. The truth is that 99.9% of the time, Cloudflare provides very valuable functionality and services that didn’t really exist before the company was created (a few alternatives are available now but not to the same degree). If you could choose 3 hours downtime per year with DDoS protection or no DDoS protection, you will likely still accept the risk of downtime.&lt;/li&gt;
  &lt;li&gt;Why didn’t they have a rollback plan? Answer: They probably did but for that to work, the failure needs to be expected so you can watch out for it and if there is no obvious link between the update and the failure (remembering that Cloudflare probably update loads of things multiple times per day), then you aren’t going to know to rollback, especially if rolling back also undoes other critical updates and you don’t know what update broke it!&lt;/li&gt;
&lt;/ol&gt;

&lt;h2 id=&quot;what-did-go-wrong&quot;&gt;What &lt;em&gt;did&lt;/em&gt; go wrong?&lt;/h2&gt;
&lt;p&gt;When taken in its entirety, it sounds bad, “Code didn’t cope properly with large files” but, like many disasters, when broken down, each element is not that bad in its own right.&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;Deployed a database permissions change. These have to be done sometime and on-paper, the change was non-breaking. The issue wasn’t that the change was wrong but that a previous SQL query assumed something that was no longer true after the change. How would you spot that in testing? Not very easily. Even if you did run it in testing, nothing broke at this point, the generated file, if it is part of the test suite would have been larger but still valid so not really a screw up here.&lt;/li&gt;
  &lt;li&gt;Ah but whoever wrote that original query should not have made assumptions! GOod luck with that, we all make assumptions every day. We cannot predict the future so we have to work with what we know now. The truth is that neither side of this did anything majorly wrong but the result was wrong. It was, however, only wrong because of another assumption…&lt;/li&gt;
  &lt;li&gt;The bot detection agents are memory constrained for performance reasons. I assume this means we don’t let things randomnly allocate whatever memory they want because that could cause performance issues that would be hard to debug. This meant that when the larger file was opened in code, there was not enough RAM and Rust threw a panic. Or rather Rust returned a result which the Developer did a panic on. This is slightly worse in my opinion but there was also possibly a disconnect between the Developer who had to code “Open the file to read the contents” and someone else who decide that “the file is only 40MB, let’s set a RAM limit of 200MB”, which is plenty, although arguably, instead of a Panic, the developer could have logged a specific error, which could have been aggregated by whatever Cloudflare monitor their system with so that as soon as there were problems, they would have seen “Error in Bot services: Out of memory opening feature file”, which might have led to a quicker conclusion.&lt;/li&gt;
&lt;/ol&gt;

&lt;h2 id=&quot;what-did-we-learn&quot;&gt;What did we learn?&lt;/h2&gt;
&lt;p&gt;We learned that we don’t have the tooling to reliable link all of these things together. We don’t have a way to markup a SQL query with “this assumes that only the default database is accessible” in a way that means if someone makes a permissions change, somehow, the assumption is surfaced to allow someone to action it. It doesn’t allow someone to annotate that, “Feature files are never supposed to be over 40MB so it is OK to panic if we run out of memory” so that, at runtime, a large feature file triggers something that highlights the danger (a log might have solved that to be fair)&lt;/p&gt;

&lt;p&gt;Don’t assume something is a DDoS until you have verified that. This made some data get interpreted correctly and make it take longer to resolve.&lt;/p&gt;

&lt;p&gt;Don’t make your control panel unavailable by using Turnstile on your main infrastructure ;-)&lt;/p&gt;

&lt;p&gt;For us as customers, consider what a 2/4/8 hour outage of Cloudflare means. Can you work around it even at reduced functionality? What if you cannot access Cloudflare to change settings over?&lt;/p&gt;

&lt;p&gt;We all need to have good responsiveness to updating status pages, which MUST be hosted on separate infrastructure to our own.&lt;/p&gt;

&lt;p&gt;Don’t use a backup DNS provider that uses Cloudflare! Yes, there are some.&lt;/p&gt;</content><author><name>Luke Briner</name></author><summary type="html">What Happened at Cloudflare on Novemeber 18th? Cloudflare, an international network/cloud-based web application proxy/firewall/DDoS protector that protects many 1000s of large websites globally became unavailable for over 3 hours, while the long-tail of recovery took another 3!</summary></entry><entry><title type="html">Secrets of effecive Dev Interviews</title><link href="https://blog.lukebriner.net/2025/10/21/secrets-of-effective-dev-interviews.html" rel="alternate" type="text/html" title="Secrets of effecive Dev Interviews" /><published>2025-10-21T19:39:01+00:00</published><updated>2025-10-21T19:39:01+00:00</updated><id>https://blog.lukebriner.net/2025/10/21/secrets-of-effective-dev-interviews</id><content type="html" xml:base="https://blog.lukebriner.net/2025/10/21/secrets-of-effective-dev-interviews.html">&lt;h2 id=&quot;the-age-old-tech-interview-argument&quot;&gt;The Age Old Tech Interview Argument&lt;/h2&gt;
&lt;p&gt;You go and find any online discussion about tech interviews and you will find two almost opposing views that cannot be reconciled easily. On the one extreme, the recruiting company run some kind of coding or aptitude test. It might be one large test or exercise or a series of smaller ones. It might take a day or an hour. It is likely to involve something that some people cannot work out and they fail the interview claiming it is not fair. On the other extreme you get self-entitled Developers who think they are basically top 1% and if you want to recruit them, you should not test them, or maybe set a really low bar so they won’t get caught out but don’t worry, we will be just fine at your organisation!&lt;/p&gt;

&lt;p&gt;Clearly being too strict with technical testing might mean missing out on someone who would be really good for your business (although you cannot know that in advance). On the other hand, if you don’t test enough, you get people who win jobs because of over-confidence or even dishonesty and who end up providing limited value for probably a lot of salary.&lt;/p&gt;

&lt;h2 id=&quot;taking-the-pieces-apart&quot;&gt;Taking the Pieces Apart&lt;/h2&gt;
&lt;p&gt;Something which is surprisingly lacking in many of these heated debates is the acknowledgement that there are many variables involved in recruitment and you cannot simplify it down to a single formula.&lt;/p&gt;

&lt;p&gt;For example, I suspect that Google has many 1000s of applications per day for their Development roles and many of these probably have computer science degrees, great skills and possibly some good experience. On paper, they all look good but Google only have space to recruit, say, 100 engineers per month. What do they do? Whatever they can do to filter out candidates. Messy or incomplete CVs. Badly answered filtering questions. Lack of desire to attend interviews or whatever. Taken in isolation, these are not good ways of filtering bad people out but from Google’s perspective, it is better than attempting to interview 1000s of people per day just in case one of those bad CVs with lots of typos and badly answered questions might be the next Guido van Rossum or Linus Torvalds.&lt;/p&gt;

&lt;p&gt;But that doesn’t matter, there is no point discussing these things if you are not in the same position as Google/Meta/Amazon etc. who are offering top 0.05% salaries to the very best engineers in the world.&lt;/p&gt;

&lt;p&gt;The second variable to understand is the disconnect between what you know as the candidate and what the recruiter knows about both you and the job. You cannot expect the recruiter to invest a significant amount of time in you just because you &lt;em&gt;know&lt;/em&gt; that you are worth employing. All they know about is what you have told them and what they have read in perhaps 5 minutes. That’s it. That thing you mentioned on your CV means nothing to them, they don’t know how hard it was to produce, how many hours you put in and, more importantly, how many hours a &lt;em&gt;good&lt;/em&gt; engineer would have needed to produce the same thing. When you tell them you are at Senior level, they don’t know that and, to be honest, you probably don’t either. I have been programming for 30+ years but where am I in the world of programmers between 1 and 100? No idea! I’ve been able to accomplish many things and have never been unable to produce what I have been asked but on the other hand, maybe my tasks have never been hard enough!? The recruiter has to do something to close the gap between what you might know and what they need to be sure of.&lt;/p&gt;

&lt;p&gt;A third variable is the knowledge that not all people are honest. Some are deliberately dishonest and will say anything for the role. Some are deluded and simply think they are really good. Some people genuinely think they are better than they are and simply lack the awareness to know that. I spoke to one guy who was 2 years out of college and who was applying for a Senior Developer role. I asked him some of the simple “warm up” questions that help people feel comfortable answering questions like, “What is the difference between overloading and overriding?” and “What is the difference between a class and a struct in C#?”. These aren’t noob questions but they are certainly something a Senior Developer would know or at least have a good stab at. This guy didn’t know the answer to any of them! I don’t usually finish interviews early but I did in this case and tried to politely point out that I thought he was being unrealistic to consider himself a Senior Developer when so young and inexperienced and most importantly, not actually knowing anything. He argued with me about that :-)&lt;/p&gt;

&lt;p&gt;A fourth variable is that roles are very different and have different needs. If I need someone to work in the Cloudflare networking team, they are clearly going to need to be &lt;em&gt;very&lt;/em&gt; familiar with the Linux networking stack and it shouldn’t come as a surprise if they are tested specifically on that. They might also be applying for a high paid role,one with good perks, so they might accept the effort required to do so. Another business is paying bottom dollar for people working on simple internal CRUD applications, as long as you can spell your name and know Visual Studio, you are in! Clearly, the approach for this company is likely to be very different. Likewise if you are going to part of a large team, the risk is much lower if you cannot make the grade and have to be let-go after 2 months, it won’t be the end of the world.&lt;/p&gt;

&lt;h2 id=&quot;getting-to-the-sweet-spot&quot;&gt;Getting to the Sweet Spot&lt;/h2&gt;
&lt;p&gt;So, where is the sweet spot in effective interviews? How do we balance the need to verify what people claim about their abilities but also not putting too many high barriers between a potential engineer and the role we want them to fill?&lt;/p&gt;

&lt;p&gt;Firstly, screening is really key to avoid taking up your previous time as a Developer or Manager in recruitment. It is depressing interviewing completely the wrong person because you want to be polite and not end the call after 5 minutes even if you already know they are entirely unsuitable. We advertised a Development role once and someone who applied worked in a mobile phone shop but was interested in becoming a programmer despite the job role clearly saying that we needed experience!&lt;/p&gt;

&lt;p&gt;There are a number of screening techniques that depend on the platform you are using. The simplest is just filtering questions like, “Do you have a minimum of 5 years commercial Development experience”. If they say no to any of the questions (or yes if they are worded the other way round!) then you automatically tell them that they are not suitable and they have only wasted their own time. What if they lie? Well you can’t stop people from lying but there are still checks you can make, for example, their CV might only go back 3 years. It might be a mistake on their part but if I saw that, immediately rejected, assuming they are not being truthful. If they say, “yes” and then in their covering letter say, “I have worked for 3 years in a company but before that I used to write my own software and sell it to my friends”, great! You can take a view on that but they have earned your trust.&lt;/p&gt;

&lt;p&gt;Talking of which, a covering letter, to me, is a must. Firstly, I want people to realise they need to work. If someone can’t be bothered to spend 5 minutes writing me a letter about why they want to work at my company, do they really have a good work ethic? If on the other hand, they are not doing it because they are just blanket applying for jobs, I am also not interested. My company has a culture, a passion, a motivation etc. and I don’t want someone working here just because I am paying them. I want them to work here because they want to work here (and they add value). It doesn’t have to be long but it does give a chance for someone to highlight anything that is not clear from the CV e.g. “The 6 month gap was when I returned to India to look after my dad” and actually gives them an advantage over people who don’t want to write a letter.&lt;/p&gt;

&lt;p&gt;You can get AI screening but to be honest, I don’t think CVs are good enough most of the time for AI to be effective so maybe use it to highlight stuff quickly like, “Shows significant experience with C#” but don’t use it as an automatic means to rejection.&lt;/p&gt;

&lt;p&gt;You can then use in-house or third-party recruiters to do other types of screening, again, based no your company’s needs. Some businesses need a specific mindset and require a psychological test. I did one of these once and then didn’t get the job. I don’t know why and I think they should have given me some feedback even if it was something I didn’t want to hear like, “your profile has you as a risk taker and we felt this wasn’t the right fit for this role” but don’t use these for the sake of it. They are subtle and require training to understand but if you need to, you can.&lt;/p&gt;

&lt;p&gt;Other types of screening might be as simple as, can they communicate clearly. Did they dress respectfully and turn up on time (or at least ask what they should wear). Do they know anything about the company whose time they are now taking up? When people say, “no”, they get pushed very close to the exit in my mind, again, because if you are offering yourself for a role with all of the cost and time but don’t know anything about the business, do you even know if you want to work here? These types of screenings can be done by anyone, and ideally can be done by different types of people. Why not have someone who is not a Developer seeing whether they like a candidate and can have a conversation with them.&lt;/p&gt;

&lt;h2 id=&quot;technical-tests&quot;&gt;Technical tests&lt;/h2&gt;
&lt;p&gt;I have previously used technical screening tests. These are simple online tests that the candidate does in their own time but take up to 1 hour. It might be something like “write some code that counts the number of even numbers in this array”. You might think these would be pointless but I have had people who either cannot do this or inexplicably take like 10 minutes to do something simple like count the number of words in this string (literally “return myString.Split(‘ ‘).Length”). This can tell you a lot. The good people finish them in 3 minutes. Could they cheat? Sure but you can play back the recording of them interacting with it and see them possibly fiddling constantly with code and not getting it. This puts some people off but you know what? A lot of what we do is online and a lot of what we do involves written instructions. If you cannot handle that, sorry but no thanks.&lt;/p&gt;

&lt;p&gt;The real scret of the effective technical test is to ask about 5 quite high-level and open-ended questions about 5 different areas of the work that person will be doing. If they are genuinely experienced, they already know what these things mean so you don’t need to explain them, just to see how deep their understanding goes, how quickly they can recall what they know and how clearly they can articulate it.&lt;/p&gt;

&lt;p&gt;When I do these interviews, it only takes me 30 minutes and I know whether they are the right level or not. I am not looking to tell a 98 from a 99 out of 100. Just to know that you have said you are experienced and you understand stuff and you have proved it.&lt;/p&gt;

&lt;h2 id=&quot;example-technical-question&quot;&gt;Example Technical Question&lt;/h2&gt;
&lt;p&gt;So for example: “Imagine you are tasked with finding out why a database query is running slowly. What is your approach to working this out and what tools do you use to identify the problem?”. Obviously you might make allowances for the fact they have only used Postgresql and not SQL Server so the exact answer might vary but still, you can tell how much they know from their answer.&lt;/p&gt;

&lt;h3 id=&quot;bad-answers-i-have-heard-these&quot;&gt;Bad Answers (I have heard these!)&lt;/h3&gt;
&lt;ul&gt;
  &lt;li&gt;“I don’t know, I never worked on databases”. Yes you did because you are a web developer and you should have learned about them to be effective.&lt;/li&gt;
  &lt;li&gt;“I would look at the code and try and run through it and verify”. No. I already told you it was the query that was slow. If that wasn’t clear, you should have asked me whether you can assume it is definitely the query&lt;/li&gt;
  &lt;li&gt;“It is probably slow because of indexes”. Nope. That is both a massive presumption and by not having any process to speak of, you clearly do not understand how to debug a slow query.&lt;/li&gt;
  &lt;li&gt;“Entity Framework could be pulling back related tables”. Yes I know. How does that answer the question?&lt;/li&gt;
  &lt;li&gt;“If it was slow I would just add some indexes”. This is sometimes the solution but how would you know whether these are needed and how would you know which ones to add? Would you just eat up disk space in the hope that you might just do something that works - for now anyway?&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 id=&quot;minimum-knowledge-required&quot;&gt;Minimum knowledge required&lt;/h3&gt;
&lt;ul&gt;
  &lt;li&gt;“I would find out the actual query being run”. Great, that is indeed the first step. Although if you said this, I would ask how!&lt;/li&gt;
  &lt;li&gt;“I would use the query analyzer to see what is slow”. Also important and correct. The follow-up here would be, what would you actually see in the query analyzer that would highlight something is slow?&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 id=&quot;the-full-answer&quot;&gt;The Full Answer&lt;/h3&gt;
&lt;ul&gt;
  &lt;li&gt;“I would use the SQL profiler to isolate the specific query that is slow. This will also confirm that is indeed one query and not possibly a number of queries that we might not be aware of that are all a bit slow and which add up to the overall performance issue”&lt;/li&gt;
  &lt;li&gt;“I would then run the query in the query analyzer where it will show a percentage cost on each element of the query, enabling me to identify which part of the query is likely causing the problem. This will also show which columns are being filtered on and which are being selected from which might give me a clue as to whether there is an index problem”&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 id=&quot;extra-credit&quot;&gt;Extra credit&lt;/h3&gt;
&lt;ul&gt;
  &lt;li&gt;“The SQL profiler shows levels of IO so if these are high, it is likely that a table scan(s) is being carried out”&lt;/li&gt;
  &lt;li&gt;“The SQL profiler sometimes shows that we are selecting more columns than it appears from the Linq-to-SQL/ActiveRecord query, which could highlight why it won’t use an index that is already present”&lt;/li&gt;
  &lt;li&gt;“In most cases, a slow part of a query is where a scan is taking place instead of a seek or where a lot of key lookups are being performed. Using the filtered and selected columns, I would evaluate existing indexes to see whether a small modification could provide what we need without using up more disk space and performance or whether another index might be needed”&lt;/li&gt;
  &lt;li&gt;“It is sometimes possible to use a filtered index to significantly reduce disk space required but still get the benefits of an index for query speed”&lt;/li&gt;
  &lt;li&gt;“Table statistics can sometimes affect the ability of the query optimizer to select the correct execution plan so we might need to look at adding additional statistics or being more clever with data distribution, adding redundant where terms etc. to help it out”&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;So what starts as a simple open-ended technical question can easily determine whether the applicant really knows their stuff, or whether they are reaching for terms they might have heard but don’t really understand.&lt;/p&gt;

&lt;p&gt;If you are worried about answering questions with AI, it is harder than you think for people to do that naturally at the right speed and without making their face look like they are reading something or typing.&lt;/p&gt;

&lt;p&gt;This one question wouldn’t determine whether someone gets though this stage but there might be a question on .Net or on Visual Studio or on your approach to work etc. each of them open-ended, each of them an opportunity to demonstrate a lack of deep expertise or someone who really knows what they are talking about, who can ask sensible follow-up questions and who knows when they have been talking too long and asks, “is that enough or do you want me to explain any more”!&lt;/p&gt;

&lt;h2 id=&quot;self-awareness&quot;&gt;Self Awareness&lt;/h2&gt;
&lt;p&gt;Something I also used to do before my HR team harmonized the process was to ask the candidate before they arrived to the interview to rate themselves from 0 to 5 across a number of relevant technologies like Cloud, Databases, .Net, HTML, Vue JS etc. I tell them what 0 to 5 mean. This serves 2 purposes.&lt;/p&gt;

&lt;p&gt;Firstly, if they rate themselves as e.g. a 1 in databases, what’s the point in asking about them in the interview?&lt;/p&gt;

&lt;p&gt;Secondly, I am less worried about whether they are a 4 or a 5 in something but if they say they are a 5 (knows the whole domain i.e. an expert) and I ask them a tough but not impossible question, they should be able to come out with a pretty-good answer even if not perfect. If I ask them the question above and they can only answer it in a really basic way, I can immediately know that a) they are not much of an expert and b) they think they are!&lt;/p&gt;

&lt;p&gt;When using this method, which I didn’t do for Juniors, I would have a rough expectation that Seniors might have at up to 6 ‘4’s and ‘5’s but likely no more than 2 ‘5’s. If they thought they had a lot more than that, it is also a signal. I have had people just assume that 20 years experience = a 5 in CSS and when you ask them about flexbox, they don’t really know anything about it. That puts you right down in the 2 bracket but you assumed you were a 5?&lt;/p&gt;

&lt;p&gt;What I like about this system, and I will try and bring it back, is that it places some responsibility on the candidate to propose who they are and then the interview is really just confirming that. If they can’t answer the questions then you are rejecting them on the basis of, “well you said you were a 5 and couldn’t answer these questions confidently”, which seems fairer than the alternative of, “I asked you some random questions and you didn’t happen to know the answers so you don’t get the job”, which is how people often relate the experience when they don’t make the grade in their Google application.&lt;/p&gt;

&lt;p&gt;It is not an exact science of course and I would never look for perfect knowledge but most of the time people are either clearly in the right ballpark (i.e. they are self-aware) or obviously nowhere near it.&lt;/p&gt;

&lt;h2 id=&quot;the-take-home-test&quot;&gt;The Take Home Test&lt;/h2&gt;
&lt;p&gt;I have only done this once for a specific candidate who had been a Development Manager, hadn’t enjoyed it that much and wanted to go back to being a Senior Developer. The Management experience can be really helpful because it shows business awareness, probably organisation skills, self-motivating etc. but I wasn’t convinced that he still had the coding chops because he had not been programming for about 5 years and his answers to my technical questions were not massively convincing. I proposed the take-home test.&lt;/p&gt;

&lt;p&gt;This is a reasonable piece of work based on something they are likely to do at your company. I created a separate backlog and git repo, which I gave him access to, I set him the task telling him it should take no longer than 1 day to code but he had a week to complete it.&lt;/p&gt;

&lt;p&gt;We also paid him, I can’t remember how much but a normal kind of contract rate. Why? Because why should he do it otherwise for free?&lt;/p&gt;

&lt;p&gt;I was able to judge not only the final output in terms of code quality etc. (it wasn’t great) but also, did he ask me any questions (he did not!), did he make small commits and ask me to review them (he also did not). So I was able to determine that not only was his code not of a suitable level but he didn’t make up for it in his ability to collaborate.&lt;/p&gt;

&lt;p&gt;I had a friendly conversation afterwards and pointed out the ways in which the code did not meet the standards I was looking for. How he had missed that some of the functionality already existed in libraries he could have used, how he lacked Unit Tests and possibly because of that, there were a number of edge cases that were missing.&lt;/p&gt;

&lt;p&gt;He thanked me for the opportunity and for paying him for it and for the chance to get an objective feel for how rusty he was!&lt;/p&gt;

&lt;h2 id=&quot;conclusion&quot;&gt;Conclusion&lt;/h2&gt;
&lt;p&gt;So the secret?&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Screening effectively&lt;/li&gt;
  &lt;li&gt;Asking a handful of different open-ended technical questions that allow someone to demonstrate their expertise and communication skills&lt;/li&gt;
  &lt;li&gt;(Optional) Get them to rank themselves both as a skills indeicator and also to find out self-awareness&lt;/li&gt;
  &lt;li&gt;(Optional) Take home test if you really want to see what they will actually do on the job.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Good luck. Recruitment is hard!&lt;/p&gt;</content><author><name>Luke Briner</name></author><summary type="html">The Age Old Tech Interview Argument You go and find any online discussion about tech interviews and you will find two almost opposing views that cannot be reconciled easily. On the one extreme, the recruiting company run some kind of coding or aptitude test. It might be one large test or exercise or a series of smaller ones. It might take a day or an hour. It is likely to involve something that some people cannot work out and they fail the interview claiming it is not fair. On the other extreme you get self-entitled Developers who think they are basically top 1% and if you want to recruit them, you should not test them, or maybe set a really low bar so they won’t get caught out but don’t worry, we will be just fine at your organisation!</summary></entry><entry><title type="html">SQL Server Filtered Indexes</title><link href="https://blog.lukebriner.net/2025/09/29/sql-server-filtered-indexes.html" rel="alternate" type="text/html" title="SQL Server Filtered Indexes" /><published>2025-09-29T12:47:01+00:00</published><updated>2025-09-29T12:47:01+00:00</updated><id>https://blog.lukebriner.net/2025/09/29/sql-server-filtered-indexes</id><content type="html" xml:base="https://blog.lukebriner.net/2025/09/29/sql-server-filtered-indexes.html">&lt;h2 id=&quot;sql-server-filtered-indexes&quot;&gt;SQL Server Filtered Indexes&lt;/h2&gt;
&lt;p&gt;Filtered Indexes are a really important tool for the Developer/DBA so let’s take a step back and understand how and why they solve the problem they do.&lt;/p&gt;

&lt;h2 id=&quot;why-do-we-need-database-indexes&quot;&gt;Why do we need Database Indexes?&lt;/h2&gt;
&lt;p&gt;I did a whole &lt;a href=&quot;https://www.youtube.com/watch?v=Pw173bq6mWA&quot;&gt;YouTube video&lt;/a&gt; on this for SQL Server a staggering 5 years ago (it seems more recent).&lt;/p&gt;

&lt;p&gt;Anyway, I won’t go over all of that except to say that a Clustered Index in SQL Server (and other databases) defines the order that the data is stored on-disk. The column(s) that define the Clustered Index are then the “primary key” of the table.&lt;/p&gt;

&lt;p&gt;What this means that if you are searching based on the primary key column e.g. &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;SELECT something FROM tablename WHERE primarykeycolumn = 234&lt;/code&gt;, something very common in CRUD applications, it will be very fast because SQL Server can use a binary-type search algorithm to find the correct row in storage.&lt;/p&gt;

&lt;p&gt;How? Imagine you have a phone directory, something that those of us of a certain age will remember. It is phone numbers ordered by either a person’s surname (Personal phone book) or by Business name (Business phone book). If you have ever used one of these or a dictionary or something else, since you know the order of entries, when you are searching by whatever you have ordered by e.g. a surname, you don’t start at page 1 and flick through the whole thing, if you were looking for, say, “Jones” you might open about 25% of the way through and depending on where you landed, flick a few pages either way and after no more than maybe 5 attempts, you will be on the correct page and find what you need! Great. This is a SEEK.&lt;/p&gt;

&lt;p&gt;What if you were looking for the &lt;em&gt;name&lt;/em&gt; of the person whose phone number is 01234 567890? It would be very difficult because of the ordering. You would literally start at the beginning and go through every single entry until you find the result (or not if it is not guaranteed to be in the book). This is a SCAN.&lt;/p&gt;

&lt;p&gt;You can imagine that a SCAN is &lt;em&gt;much&lt;/em&gt; slower than a SEEK so we need to avoid them and that is why we need indexes.&lt;/p&gt;

&lt;h2 id=&quot;how-does-the-index-work&quot;&gt;How does the index work?&lt;/h2&gt;
&lt;p&gt;Imagine that as well as the phone book ordered by surname, you had another one that was ordered by phone number? Well ignoring the fact that it would only cover a few telephone areas, it would allow you to find a name from a phone number using a SEEK so you get what you need - a fast lookup. At what cost though?&lt;/p&gt;

&lt;p&gt;Imagine the two versions of the phone book, you will need twice as much paper to print it and twice as much shelf space to store it, which is exactly the same problem you get with a database index (or not quite: more on that later!). If I want a second index on my table ordered by something else, to allow faster lookup by something other than the primary key, it will cost you double the disk space by default.&lt;/p&gt;

&lt;p&gt;This might be a cost worth paying but what if your table is 300GB in size? Do you really want to add another 300GB for an index?&lt;/p&gt;

&lt;h2 id=&quot;options-to-avoid-the-size&quot;&gt;Options to avoid the size&lt;/h2&gt;
&lt;p&gt;Why not just set the clustered index to order by something else? Well, this fixes lookups by whatever you want to order it by but it still won’t allow you to quickly search for things by more than 1 lookup key. You should also have a clustered index where new items are appended and not inserted. If you ordered by e.g. Surname and you added a new row “Jones”, it would need to find an empty row in the correct page of the table and if it couldn’t find space, it would need to split a page, copy half of the data to the new page to create space and then add the new row. That is mostly why people use an auto-increment integer or sometimes a UUID that is ordered by time so new ones are always higher up the table than older ones.&lt;/p&gt;

&lt;p&gt;The “correct” way to avoid the size issues of indexes is to reduce the columns that are indexed and/or returned from the index. Imagine that instead of all the data for a person, the “lookup by phone number” index just contained a reference number and that reference number could be used to find the data in the original phone book? The new index could potentially be much smaller.&lt;/p&gt;

&lt;p&gt;In fact, most database tables are more than 3 or 4 columns, so if your table search function looks up something by “lookup column” and only needs to return a handful of other columns, you can index on “lookup column” and then &lt;em&gt;include&lt;/em&gt; the columns you need to return. This could be much smaller than the clustered index, maybe only 10-20% on a table with lots of columns. 30GB extra is much more desirable than 300GB extra.&lt;/p&gt;

&lt;p&gt;Indexed columns are those that appear in the WHERE clause of your SQL so if you had &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;SELECT country FROM users WHERE name = &apos;luke&apos; AND age = 35&lt;/code&gt; then both &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;name&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;age&lt;/code&gt; would need to be indexed columns but &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;country&lt;/code&gt; could be included in the index instead.&lt;/p&gt;

&lt;h2 id=&quot;why-filtered-indexes-then&quot;&gt;Why Filtered Indexes then?&lt;/h2&gt;
&lt;p&gt;In the above example, you could possibly get quite a small index, but if you need to return most or all of the columns in the query then that is where it can fall down. If you index is fully covering, it will be used but will be really large. If it is NOT covering, SQL Server will work out whether to use the index to get the primary key and then a “Key Lookup” to get the missing columns from the clustered index or if it thinks there will be too many key lookups, it will simply ignore the index and scan the clustered index!&lt;/p&gt;

&lt;p&gt;Secondly, if you search by a number of different columns, then you might need indexes for all of them so your 30GB index becomes 10 x 30GB indexes and you are back to eating disk space!&lt;/p&gt;

&lt;p&gt;This is where a &lt;a href=&quot;https://learn.microsoft.com/en-us/sql/relational-databases/indexes/create-filtered-indexes?view=sql-server-ver17&quot;&gt;filtered index&lt;/a&gt; &lt;em&gt;could&lt;/em&gt; be the solution. The filtered index is like a normal index but in addition to setting indexed and included columns, you can define a filter statement like &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;Active = 1&lt;/code&gt; or &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;Age &amp;gt; 20 AND Disabled = 0&lt;/code&gt; and if your query contains part of the where clause that matches this (in any order) then the query planner will prefer your filtered index. It will still be subject to normal optimisation like if it doesn’t return all required columns, it might cause key lookups or be ignored but now the space savings could be significant if your have a table where you are querying only a small number of rows using static comprison functions like &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;Age &amp;gt; 20&lt;/code&gt; rather than something dynamic like &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;name = &apos;luke&apos;&lt;/code&gt; which is not what a filtered index is for.&lt;/p&gt;

&lt;p&gt;I added one to a table of background jobs to process, most of what are already set to &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;Done = 1&lt;/code&gt; so my filtered index is simply based on &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;Done = 0&lt;/code&gt;, which would normally only return maybe max 10 rows. On my 3MB table, it only took 16kb of space, which is probably the smallest it could be anyway and that was an index that was including most of the table columns so that it did not require a key lookup to be performed.&lt;/p&gt;

&lt;p&gt;SO for scenarios where it fits, it is an amazing tool to speed up queries without eating up space. The MS page lists these as suitable scenarios:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;When the values in a column are mostly NULL and the query selects only from the non-NULL values. You can create a filtered index for the non-NULL data rows.&lt;/li&gt;
  &lt;li&gt;When rows in a table are marked as processed by a recurring workflow or queue process. Over time, most rows in the table will be marked as processed. A filtered index on rows that aren’t yet processed would benefit the recurring query that looks for rows that aren’t yet processed.&lt;/li&gt;
  &lt;li&gt;When a table has heterogeneous data rows (lots of different values). You can create a filtered index for one or more categories of data. This can improve the performance of queries on these data rows by narrowing the focus of a query to a specific area of the table. Again, the resulting index will be smaller and cost less to maintain than a full-table nonclustered index.&lt;/li&gt;
&lt;/ul&gt;</content><author><name>Luke Briner</name></author><summary type="html">SQL Server Filtered Indexes Filtered Indexes are a really important tool for the Developer/DBA so let’s take a step back and understand how and why they solve the problem they do.</summary></entry><entry><title type="html">The SQL Core Licensing problem</title><link href="https://blog.lukebriner.net/2025/08/26/sql-server-core-problem.html" rel="alternate" type="text/html" title="The SQL Core Licensing problem" /><published>2025-08-26T11:38:01+00:00</published><updated>2025-08-26T11:38:01+00:00</updated><id>https://blog.lukebriner.net/2025/08/26/sql-server-core-problem</id><content type="html" xml:base="https://blog.lukebriner.net/2025/08/26/sql-server-core-problem.html">&lt;h2 id=&quot;what-is-the-sql-server-core-licensing-problem&quot;&gt;What is the SQL Server Core Licensing problem?&lt;/h2&gt;
&lt;p&gt;SQL Server licensing has for a long time been based on the number of physical cores on the host machine, which historically worked out OK I guess. Back when 4 cores was “wow” and 8 cores was “WOWWWW”, if you could afford such machines, you could definitely afford to grease Microsoft’s palm with a few hundred pounds per month for licensing SQL Server.&lt;/p&gt;

&lt;p&gt;However, things have changed now and this can present a problem for those of us who want or need to use SQL Server Enterprise (multiple databases in availability groups and more than 1 standby replica). Today, the price to license 4 cores is approximately $10K per year. Yes, on your server and yes only for 4 cores. You want 8 cores? $20K. So why would anyone want to do this? Quite simply, SQL Server was one of the most reliable database engines before Postgresql was heard of and before MySQL/MariaDB was considered suitable for business (of course, you could always use it if you knew how to debug it/get support whatever) so SQL Server ate the market for people who didn’t want to pay “Oracle” money and it got sticky. At SmartSurey, we use SQL Server and like many organisations, have so many stored procs with some complex logic that to convert these to use e.g. Postgresql would be both time-consuming, costly and risky whether you did it with AI or contracted it out to another company.&lt;/p&gt;

&lt;h2 id=&quot;why-is-it-a-problem-now&quot;&gt;Why is it a problem now?&lt;/h2&gt;
&lt;p&gt;The problem now is that if you want to use SQL Server on only, e.g. 4 cores, and you want a new server, good luck! A bit like you don’t get computers with 2GB RAM any more, cores are so cheap that most modern CPUs are at least 8 and usually many more. Your physical server might be a few $1000 but your licensing gets very expensive very quickly.&lt;/p&gt;

&lt;p&gt;On cloud, it can be even worse. If you want a lot of RAM for SQL Server to use for cache, how many cores that come with? If you look at Azure’s pricing page (AWS and others are similar), a memory optimized Eadsv6 instance with 4 cores only comes with 32GB RAM. If you wanted e.g. 4 cores and 128GB RAM, tough luck! This makes sense though, you are buying a slice of a host VM that probably has like 128 cores and 1TB RAM so you get the relevant percentage of both even if you don’t want the cores because “SQL Server Licensing”.&lt;/p&gt;

&lt;p&gt;Now this page is slightly misleading because actually you &lt;em&gt;can&lt;/em&gt; get more RAM for your cores if you go into the Azure portal and create a VM (I guess the pricing page might get too big if they had every option in it?). In the portal, you can get 4 cores with 64GB RAM or if you are happy to use a v5 Azure image, even 4 cores with 128GB RAM but still, if you wanted e.g. 4 cores and 256GB RAM, you would have to pay for 8 cores and double your SQL Server Licensing to get the extra RAM!&lt;/p&gt;

&lt;p&gt;So it might be possible for the short-term to get a memory-optimised VM on Azure/AWS/wherever and get the correct blend of cores and RAM but if you do then need to go above that, you will need to migrate to the solution before which will entail creating a new instance, migrating data and then switching over.&lt;/p&gt;

&lt;h2 id=&quot;just-limit-the-cores-used&quot;&gt;Just limit the cores used&lt;/h2&gt;
&lt;p&gt;In SQL Server, you can actually tell the database to only use up to e.g. 4 cores even on an 8 core machine, and I assume that works OK BUT….the licensing is very clearly based on the physical cores on the host, not on the number you limit it to on SQL Server, which is quite extraordinary but then SQL Server must be an enormous cash cow for Microsoft (“Server products and cloud services” is $98B for 2024. Of that, around a quarter is estimated to be database products).&lt;/p&gt;

&lt;p&gt;So what options do you have?&lt;/p&gt;

&lt;p&gt;One option is to put more databases on the same host so that instead of e.g. 2 x 4 core licenses on 2 x servers, you could pay for 1 8 core licence for the same amount of money but then you are sharing resources so that might not be the best option for a production workload.&lt;/p&gt;

&lt;p&gt;The second most obvious is to run SQL Server as a virtual machine. But, and this is also a requirement, the virtual machine cannot just use e.g. “4 virtual cores”, it must be pinned to 4 cores and not be allowed to use time slots on other cores even if it is supposed to add up to “4 virtual cores” because you might get like 1% extra performance and Microsoft would like to charge you for that. This is possible.&lt;/p&gt;

&lt;h2 id=&quot;sql-server-as-a-vm&quot;&gt;SQL Server as a VM&lt;/h2&gt;
&lt;p&gt;If you have a bare-metal server, it is easy enough to install &lt;a href=&quot;https://www.proxmox.com/en/&quot;&gt;Proxmox&lt;/a&gt;, VMWare or even, God forbid, Hyper-V and then install SQL Server as a virtual machine as long as you can pin the cores. It is a bit more work than you would like but you get to pin the cpus but potentially pass through almost all of the host RAM to get the blend that you want. The networking and stuff is not too complicated and you solve your problem (although it would be nice if you were just allowed to pin SQL Server to a number of cores for licensing purposes).&lt;/p&gt;

&lt;p&gt;On the cloud, it is harder and for a good reason, not for price-gouging. On the cloud, there is already a Hypervisor that you do not have access to (even if you pay for the use of 100% of its resources). This is obvious because the cloud provider needs to be able to control that machine and when it gets destroyed, re-provisioned etc. This means that everything you see and interact with is a virtual machine even if you are using all of the resources of the host, it is still a single VM on a single hypervisor. There is no obvious way around that, if you had access to the Hypervisor, how would Azure take it back once you stop paying for it? Netboot? No thanks.&lt;/p&gt;

&lt;p&gt;So you have a different approach using &lt;em&gt;nested virtualisation&lt;/em&gt;, which is a bit easier said than done because, of course, when you create a new VM, you can’t choose “Hyper-V” or “Proxmox” as the image to use, you have choose something like Windows Server or Debian Linux, which is where Proxmox comes in!&lt;/p&gt;

&lt;h2 id=&quot;nested-virtualisation-on-azure-with-proxmox&quot;&gt;Nested virtualisation on Azure with Proxmox&lt;/h2&gt;
&lt;p&gt;I will write a separate post about performance concerns and the like but at this point, I will just describe roughly what you need to do. It is not officially supported but it is allowed so don’t worry!&lt;/p&gt;

&lt;p&gt;Because Proxmox is amazing and is based on the KVM kernel module, it can be installed on top of Debian instead of having to be installed from an ISO-type installer. This limit you somewhat because things like disk setup, which is usually part of the Proxmox installer will need to be done manually before installing Proxmox but there is an &lt;a href=&quot;https://pve.proxmox.com/wiki/Install_Proxmox_VE_on_Debian_12_Bookworm&quot;&gt;article&lt;/a&gt; on the Proxmox site that basically describes how to add the relevant packages. It really doesn’t take very long and then you open the browser, point it to the Proxmox GUI and you are away.&lt;/p&gt;

&lt;p&gt;There is all the usual configuration to open ports, get https certificates and whatever but it just works really. Once you are up and running, you need to create the VM for SQL Server probably running Windows Server (SQL Server for Linux is still a bit hit and miss for me), give it e.g. 4 cores and most of the RAM. You generally won’t be running any other VMs so just leave some RAM, maybe 2-4GB for the Proxmox host. You then need to lock down the VM to use 4 specified cores so that you meet the licensing requirements.&lt;/p&gt;

&lt;p&gt;The only real pain is that as a nested VM, Azure won’t be aware of it so you can’t e.g. directly bridge the VM to the Azure vlan. Instead, I just added port forwards for 1433 and 3389 from the host to the guest and treated the host almost entirely like the SQL Server itself for networking purposes. The port forwards add maybe max 0.1 to 1 ms to latency and apparently you can use HAProxy to do it even quicker but until you can run a repeatable performance test on it, that extra latency is probably going to be lost in the noise.&lt;/p&gt;

&lt;p&gt;If you have paid for a large host e.g. 16 cores and 256GB RAM and are only using 4 for your SQL Server VM then when you need to expand, you can easily re-configure the host VM to pass more cores to the VM with very little effort and no extra VM rental. Don’t forget to buy your extra SQL Server licence though ;-)&lt;/p&gt;</content><author><name>Luke Briner</name></author><summary type="html">What is the SQL Server Core Licensing problem? SQL Server licensing has for a long time been based on the number of physical cores on the host machine, which historically worked out OK I guess. Back when 4 cores was “wow” and 8 cores was “WOWWWW”, if you could afford such machines, you could definitely afford to grease Microsoft’s palm with a few hundred pounds per month for licensing SQL Server.</summary></entry><entry><title type="html">We need UX, not UI!</title><link href="https://blog.lukebriner.net/2025/07/23/ux-not-ui.html" rel="alternate" type="text/html" title="We need UX, not UI!" /><published>2025-07-23T13:10:01+00:00</published><updated>2025-07-23T13:10:01+00:00</updated><id>https://blog.lukebriner.net/2025/07/23/ux-not-ui</id><content type="html" xml:base="https://blog.lukebriner.net/2025/07/23/ux-not-ui.html">&lt;h2 id=&quot;my-experience-of-most-products-and-apps-is-very-poor&quot;&gt;My experience of most products and apps is very poor&lt;/h2&gt;
&lt;p&gt;Have you ever gone to a web application, maybe to register or buy something and it is just really, really, slick? You come away surprised how easy it was. You press only as many keys as you expect and get the response. For example, lots of Cable Companies have a page “are you in my area” where you enter a postcode and press [enter] and bang: “Congratulations, we do provide services in your area, click here to check pricing” or whatever. You know that you need to enter a postcode and then bang.&lt;/p&gt;

&lt;p&gt;Unfortunately, most sites are NOT like this, not even close. You end up pressing way more keys than you need to. The seemingly now defunct site webpagesthatsuck.com very famously in the mid-1990s showed various mistakes that were common in poor web sites like:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;The home page is too long&lt;/li&gt;
  &lt;li&gt;“mystery meat” navigation&lt;/li&gt;
  &lt;li&gt;It doesn’t work in all major browsers&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;There are also things that are less of an issue today like not using Flash but the point being that it was laughable that people made so many basic mistakes except…“when it is your own website”.&lt;/p&gt;

&lt;h2 id=&quot;examples-of-poor-design-patterns-in-2025&quot;&gt;Examples of poor design patterns in 2025&lt;/h2&gt;
&lt;ol&gt;
  &lt;li&gt;Use of a front-end framework that is not optimized and therefore makes 10s or 100s of API requests when it could have been done with a single back-end request. This leads not only to slow-loading but worse, it causes inexplicable errors if some of those backend calls don’t work but others do. Very rarely do those individual errors propagate in any meaningful way to the front-end. A page load error and the only option is to Refresh and do the whole thing again. Likewise, a broken API endpoint is missed for a long time because its effects are not important enough to be noticed leading to pollution of the console when trying to debug other errors.&lt;/li&gt;
  &lt;li&gt;Unhelpful input/validation messages that could have been avoided. e.g. “Please enter your vehicle registration with no spaces”. Why? Let them type spaces and you remove them in the back-end. Honestly, it would be quicker than typing out that hint. If you cannot do this, shame on your lack of skill/knowledge/choice of framework or Management. e.g. “Invalid code. The format should be two capital letters and 10 to 12 digits”. Again, people will often type in lower-case because that is easier. If lower-case is never valid, convert them on the back-end to upper case and then return the result!. Really. Do that. (This was on a VAT code lookup site for the EU. Which company writes these apps for the EU and doesn’t know how to do that?). e.g. “Invalid number, please check and re-enter” why? accidental space at the start or end. Answer? You guessed it: trim it on the backend before validating it.&lt;/li&gt;
  &lt;li&gt;Validation messages that don’t tell you what is wrong. e.g. “There is a problem with the address. Please check and re-enter”. And the problem is? No idea. Did the system attempt a lookup into a database and couldn’t find the company? Did it find something similar but different? Does it expect the county to be the full word instead of an abbreviation? Who knows. Lazy, lazy, lazy.&lt;/li&gt;
  &lt;li&gt;Putting too much on the summary page. It isn’t a summary if it has 25 snippets of various random facts. A/B test it. Move things to the top that are being clicked on a lot, move the other stuff down or even better, remove it completely. If I want to find my bill, I can find it under the billing menu, I don’t need a random fact on the summary page like “Last bill was £25”, what are the chances that this is what I was looking for and can logout now? Just because you have data, doesn’t mean it needs exposing in all its glory. This is also partnered with having far too many top-level menus. You probably shouldn’t have more than 5 top-level menus. At that point, if you click into billing, &lt;em&gt;then&lt;/em&gt; you might have Payments, Bills, Payment Method etc. where you would expect to find them. Not having these at the top level doesn’t violate the rule of “fewer clicks is better” because it helps with cognitive load to not have to read through 30 menu options to make sureyou go to the right place.&lt;/li&gt;
  &lt;li&gt;Inventing your own esoteric security mechanism. Passwords and 2-factor possibly offering SMS (I know!), authenticator app and passkeys is widely used, well-known and acceptably secure. Why am I seeing businesses doing things like “2-factor by sending a code to an email address that is NOT your registered email address”. “Type in a code that we will ask for random characters from” - how long does that take to work out? What are you protecting against?&lt;/li&gt;
  &lt;li&gt;Basics like not supporting the [Enter] key on, say, the page where you ask for the 2-factor code to be entered. Poor use of hover which makes it unworkable on a mobile and unecessarily picky on desktop. Move the mouse 2 pixels and bang, the menu has gone! Most people don’t give a toss about your “fresh design”. It’s fine to look nice but give it a rest, people want to find what they need, not watch pointless animation of elements that look “really cool”.&lt;/li&gt;
  &lt;li&gt;Asking, “how did we do?” every &lt;em&gt;single&lt;/em&gt; time someone interacts with your business. You know what? You should know how you did without asking. If someone called to change their address, you answered the phone quickly and changed their address, what exactly are you expecting the person to say? If they have called up for the 5th time to change their adress because it hasn’t worked? Guess what? They think you suck. Amazon do it, “how was the delivery?”. My opinion is that I ordered something and it was delivered, what do you expect me to say? Again, you should already know. Of course, it’s fine to point people to a place they can make a complaints and to be honest, you should be unashamed about this because if you are running your business properly, you won’t expect too many complaints and should deal with them.&lt;/li&gt;
  &lt;li&gt;Using dark practices to trick people into doing things you want them to do. Like when you offer them the new “AI Assistant” and the options are “Yes” and “Maybe later”. No, the other option is, “I really don’t want to use this thing so don’t ask me again”. Or when people try and get cool by changing buttons like “Good” and “Bad” into “Amazing” and “Not so great”. OK, Good and Bad might not be quite right but really? “Amazing” is not the same as “Good” if you expect people like me to press “amazing” when something was just good, I won’t and you miss a data point, just get over yourself.&lt;/li&gt;
&lt;/ol&gt;

&lt;h2 id=&quot;ux-the-missing-skill&quot;&gt;UX, the missing skill&lt;/h2&gt;
&lt;p&gt;I think the truth is that businesses often recruit UI positions before they employ UX and I would say that is the wrong way round. UI people, probably designers, obsess over things that your customers literally care nothing about. That slightly different shade of green, that lighter grey subtle text effect and the pinstripe button outlines with just a hint of shadow. As the saying goes, you are often missing the wood for the trees.&lt;/p&gt;

&lt;p&gt;Some software looks horrible but people buy it because: It works consistently; it doesn’t get changed every 2 months because of the latest design fad; the interface is snappy and responsive; the controls and menus are all where you expect them to be (not the Material Design “Add” button by itself in the bottom-right where no-one can see it).&lt;/p&gt;

&lt;p&gt;You should instead recruit a UX designer. They can use whatever Bootstrap type UI framework you want because: no-one except front-end devs cares. Bootstrap had scss from the beginning (well it was sass originally), so Bootstrap websites don’t need to look like it. However, the UX is crucial because that person is your customer. They try and ask random questions of your system like, how would I expect to find the latest bill? How easy is it to search for stuff? Do the search results return far too many or nowhere near enough results?&lt;/p&gt;

&lt;p&gt;The UX Designer will then challenge the development team to remove those cringing friction points like validation errors that the system could easily fix or lazy error messages that don’t describe what the user did wrong. If you did that, you will get happy customers who won’t get angry when your automated phone system spends 30 seconds of you life saying, “did you know, you can do loads of stuff on our web site…..”!&lt;/p&gt;</content><author><name>Luke Briner</name></author><summary type="html">My experience of most products and apps is very poor Have you ever gone to a web application, maybe to register or buy something and it is just really, really, slick? You come away surprised how easy it was. You press only as many keys as you expect and get the response. For example, lots of Cable Companies have a page “are you in my area” where you enter a postcode and press [enter] and bang: “Congratulations, we do provide services in your area, click here to check pricing” or whatever. You know that you need to enter a postcode and then bang.</summary></entry><entry><title type="html">Why you must test database queries</title><link href="https://blog.lukebriner.net/2025/06/04/why-you-must-test-database-queries.html" rel="alternate" type="text/html" title="Why you must test database queries" /><published>2025-06-04T13:03:00+00:00</published><updated>2025-06-04T13:03:00+00:00</updated><id>https://blog.lukebriner.net/2025/06/04/why-you-must-test-database-queries</id><content type="html" xml:base="https://blog.lukebriner.net/2025/06/04/why-you-must-test-database-queries.html">&lt;h2 id=&quot;how-many-developers-equate-it-works-with-it-is-correct&quot;&gt;How many Developers equate “it works” with “it is correct”?&lt;/h2&gt;
&lt;p&gt;Scenario: A Developer is asked to add a new page into a web application that loads a list of widgets and perhaps has some search functionality. They add the page, it looks OK, the code looks OK and the search appears to work. Many Developers stop there.&lt;/p&gt;

&lt;p&gt;The page goes into production and it &lt;del&gt;runs like a dog&lt;/del&gt; (strange metaphor because dogs run quite fast) runs like a dog with no legs. Some then complains that the search doesn’t work in all scenarios. Ouch, what happened?&lt;/p&gt;

&lt;p&gt;The Developer saw that the page “worked” and that the search “worked” and decided the job was done. A good reviewer would have asked, “Did you test the DB query in the profiler?”. Even if the Developer tries to be defensive, “Oh, it’s pretty much the same as the other code that was there before”, my answer is usually, “I don’t care about the code that was there before, I care about the new code. Go and test it in the profiler”. Why so arsy? Because there several ways in which you can be caught out if you don’t!&lt;/p&gt;

&lt;h2 id=&quot;things-that-go-wrong-in-database-queries&quot;&gt;Things that go wrong in database queries&lt;/h2&gt;
&lt;ol&gt;
  &lt;li&gt;The test database is much smaller so the queries seem to run at an acceptable speed when the Developer tested it but they didn’t know that the query was missing an index and performing a table scan. In test test DB with 1000 rows and not much contention, it is fast enough. In production with 1000s of connections and 100s of millions of rows in the same table….yeah. Really, really slow and possibly also causing deadlocks for other connections as a result.&lt;/li&gt;
  &lt;li&gt;Your query is “pretty much the same” as the old one but it is not the same. You added another column in the select statement which is not in the index but you didn’t notice. By profiling, you spot the problem and update the index or change the way you query the data.&lt;/li&gt;
  &lt;li&gt;Your query is projected in a strange way. Your where clause on a nullable column, for example, that is basically &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;Deleted != true&lt;/code&gt; gets projected as &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;Deleted &amp;lt;&amp;gt; cast(1, as Bit)&lt;/code&gt; which means it misses the filtered index for &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;Deleted = false&lt;/code&gt; and again causes a table scan. If you notice that, you change your code and find that using &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;Deleted == false&lt;/code&gt; projects correctly.&lt;/li&gt;
  &lt;li&gt;Your search term has not been passed correctly so it only finds fields beginning with something rather than any string within a field. You didn’t notice because your testing was lazy and didn’t try a few scenarios.&lt;/li&gt;
  &lt;li&gt;There were too many columns returned in the projected &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;Select&lt;/code&gt; statement meaning that you will pull far more data than you needed, even if it is quick, it will hurt the network bandwidth. You realise that a certain operation in your Linq-SQL query didn’t work as you anticipated and you therefore have to change it to something else or use raw SQL instead.&lt;/li&gt;
  &lt;li&gt;The profiler shows you that the query is performing a key lookup instead of being fulfilled by a single query. This will generally be much faster than a table scan but still quite slow for large result sets. This has happened because the extra column you added is not in the non-clustered index so it decides a key lookup is best. Unfortunately, this can be even worse in production because it will base the query plan on the first result set, which might be small = key lookup but then users whose list is much larger get a dis-proportionately slower experience.&lt;/li&gt;
  &lt;li&gt;Probably others that I can’t think of right now&lt;/li&gt;
&lt;/ol&gt;

&lt;h2 id=&quot;how-do-you-check-it&quot;&gt;How do you check it?&lt;/h2&gt;
&lt;p&gt;A really critical skill of the Developer is knowing their debugging tools and being able to run them up quickly. For example, I wonder how many .Net Developers could quickly run up SQL Server Profiler and see what they need to see and how many either have never used it or would spend 30 minutes getting it working. Not acceptable. Testing a query should take a minute or so.&lt;/p&gt;

&lt;p&gt;Run the profiler up, capture the query being sent from the app, run it in SQL Server Management Studio with the Show Execution Plan button pressed. Easy. Understanding the query plan, again, should be easy for any Developer worth their salt. Knowing &lt;em&gt;why&lt;/em&gt; something isn’t working is sometimes harder. There are cases where the query planner takes a wrong decision but 90% of the time, you should easily see why an index was missed during the query.&lt;/p&gt;</content><author><name>Luke Briner</name></author><summary type="html">How many Developers equate “it works” with “it is correct”? Scenario: A Developer is asked to add a new page into a web application that loads a list of widgets and perhaps has some search functionality. They add the page, it looks OK, the code looks OK and the search appears to work. Many Developers stop there.</summary></entry><entry><title type="html">You need to understand diminishing returns</title><link href="https://blog.lukebriner.net/2025/06/04/understand-diminishing-returns.html" rel="alternate" type="text/html" title="You need to understand diminishing returns" /><published>2025-06-04T12:39:00+00:00</published><updated>2025-06-04T12:39:00+00:00</updated><id>https://blog.lukebriner.net/2025/06/04/understand-diminishing-returns</id><content type="html" xml:base="https://blog.lukebriner.net/2025/06/04/understand-diminishing-returns.html">&lt;h2 id=&quot;rules-for-developers&quot;&gt;Rules for Developers&lt;/h2&gt;
&lt;p&gt;I don’t really like those articles about “the top 10 things you need to know as a Developer” because like a lot of things in life, it sounds sensible but really it should be “the top 10 things I needed to know with my background, training and experience that are probably very specific to me”.&lt;/p&gt;

&lt;p&gt;Things change and lessons that were once hard-learned before are not really a thing any longer. Back in the day, people would lose their voices telling you not to write your own session or auth system for a framework. Why? Because it was hard to get right and…a lot of early frameworks didn’t have them built-in. Guess what? They do now so very few people need to learn the lesson about using the build-in features of a framework.&lt;/p&gt;

&lt;p&gt;Anyway, one thing I have learned that I think a lot of Engineers are prone to get wrong is understanding about “diminishing returns”. Hopefully you understand the phrase but imagine you buy a cheap second-hand car and you want to make it run better. You would start with the cheapest easiest things right? Maybe clean the fuel system out, replace the filters and spark plugs etc. If you want even more performance, you would move on to slower more expensive things like rebuilding the brake calipers to make sure they are not binding. Replacing wheel bearings and piston rings maybe. If you were still not satisfied, you start getting to the very expensive options like replacing the gearbox, replacing the engine, re-tuning it, replacing the exhaust, the inlet manifold etc. These things don’t necessarily make your cheap car a lot better and start to cost so much money, you would be better off buying a faster car and not spending any money on that.&lt;/p&gt;

&lt;h2 id=&quot;diminishing-returns-in-caching&quot;&gt;Diminishing returns in caching&lt;/h2&gt;
&lt;p&gt;When Developers learn about caching in web apps, they usually get excited because that database query that used to take 100ms now comes from cache and only takes 1ms - at least it should. However, you realise that you need to decide how long to cache something for. Cache it for too short a time and you risk excessive database calls to refresh it, leave it in there too long and it gets stale and might cause your customers some confusion - how should you balance it?&lt;/p&gt;

&lt;p&gt;If you already understand that you need to balance it, you probably don’t need to continue reading but there are lots of Developers who are on holy missions of perfection. Why cache it for 5 minutes when I could cache it for &lt;em&gt;24 hours!&lt;/em&gt; Just imagine all the database calls I would save.&lt;/p&gt;

&lt;p&gt;These people are a problem to have in your team and will cause far more issues than they solve. Why? As we already said, there are diminishing returns and sometimes the savings are not even there in the first place!&lt;/p&gt;

&lt;p&gt;It is critical that Developers measure things and we often don’t. Cache = better than database calls right? Well, not necessarily. It depends how you deserialize the cache. Json? I thought so! Have you actually timed that? You might actually find that a database call materialised into an object is actually faster than using cache but that is not the main point so I will leave you to ponder that. Most databases now run on solid-state disks and are exceptionally fast. Our main application at SmartSurvey has typical database times of 1ms and that is with a synchronous replica! Beat that Json deserialisation!&lt;/p&gt;

&lt;p&gt;Let’s imagine that every call to your web application currently calls the database to load the features that are available on the user’s plan. On a Basic account you can’t do most things but on Enterprise you can. Great, we load a map of plans to features and when the user tries to access something, we do a check that the feature in-question is on the plan that the user has. Let’s imagine that we have something like 10,000 calls per day. 10,000 lots of 1ms database calls is only 10 seconds of time in the entire day! Don’t cache until you know there is a problem. But OK, imagine that we have 100,000,000 calls per day, now we have 27 hours of database calls per day!? In other words, we would have to be running multiple web servers but that is significant and would definitely benefit from caching.&lt;/p&gt;

&lt;h2 id=&quot;how-long-to-cache-for&quot;&gt;How long to cache for?&lt;/h2&gt;
&lt;p&gt;The perfectionist Developer thinks, well, we hardly change the features on the plans very often, so why not go nuts and cache them for 8 hours. If we do update the features, we can either wait 8 hours for the cache to refresh or we could simply re-deploy the application or restart the web servers and it will reload cache. The perfectionist is happy because we have now gone from 100M calls per day to 3 per day per web server. Wow, feels amazing. But we have created a significant burden for people planning to change features. They either have to change them 8 hours in advance (in which case they might appear too early) or do something else to reset the cache but…3 per day right!&lt;/p&gt;

&lt;p&gt;Here’s the thing. What if we only cached them for 5 minutes instead of 8 hours? That is only 288 calls per day. Maybe a couple of orders of magnitude higher than 3 but compared to 100M, still an enormous saving and remember, at 1ms per call average, only 288 milliseconds per day per web server and without the 8 hour lag between updating features and seeing them appear in the application.&lt;/p&gt;

&lt;h2 id=&quot;understand-your-developers&quot;&gt;Understand your Developers&lt;/h2&gt;
&lt;p&gt;You need to identify this as early as possible. The perfectionist Developer invariably takes too long to develop their software, is over protective of it because of the time they have invested in it and doesn’t usually have the massive beneficial effect that they think they do by spending all of that time.&lt;/p&gt;

&lt;p&gt;The way to manage it is to ask for data. If someone wants to add caching (in this example), ask them to measure the problem - if they can’t do that in 5 minutes then don’t let them do it. Ideally they should already have the data before even mentioning it, otherwise how do they know it’s a problem? You can then discuss if there is a problem and how far to go to make an improvement before they start work.&lt;/p&gt;

&lt;p&gt;Having goals/OKRs/targets can also be really important to ask that person how the change will move any of those targets. If a page takes 1 second to load and the database isn’t struggling to keep up, will the user even notice the 20 or 30 milliseconds you improved? No, they will not!&lt;/p&gt;</content><author><name>Luke Briner</name></author><summary type="html">Rules for Developers I don’t really like those articles about “the top 10 things you need to know as a Developer” because like a lot of things in life, it sounds sensible but really it should be “the top 10 things I needed to know with my background, training and experience that are probably very specific to me”.</summary></entry></feed>