What is missing from safety-critical software design?
What tech runs on our safety-critical systems?
I remember being surprised but then not surprised to find out that some of our most modern airlinersand fighters use very old technology. I can't find the article (sadly) but someone was saying that flight computers were running on Intel 8086 processors! Yes, that's right. New up-to-date planes use technology from 1978.
Why was I surprised? Well the F-35 project is using AR, and AI to help the pilot so it isn't like we are afraid of new technology but for some reason these flight computers are super old.
Why was I then not surprised? Well, it seems instinctive that we stick with what is tried and tested right? Any change is a risk and these old computers have 100s of 1000s of flying hours between them so most if not all bugs should have been found and fixed.
The Problem with "if it ain't broke, don't fix it"
There are a few problems with this mentality.The first, major problem, is that it assumes that testing is the only way to ensure that something works. The implication here is that if something has that many hours of real-world testing then it must be OK. Of course this isn't always true. Some edge cases can still cause problems even though there have been 10s of 1000s of other flights without incidents. In other words, testing is not a great way of ensuring compliance - in fact, you could argue the opposite. If you need to do fuzzy testing, it implies that you are lacking a formal way of testing all of the expected inputs and outputs of a system. This is the equivalent of releasing a new car and thinking that because it seems to drive OK on a test circuit, let's just allow the public to find the edge cases!
There is another problem, though, which is that it proves how much the software is tied to the hardware stack. Ideally, there should be at least a small amount of room to alter the hardware and expect the software to behave the same. Of course, much of this software is assembly and it always will be if there is no push to change the testing culture of these organisations and accept that we need to move to higher level software, where its reasonable, so that we can much more easily write, understand, debug and update our systems.
One of the problems noted on an article I read is that these computers are being asked to do more than ever before in terms of new features, whether for safety or for pilot-assist and there is simply either no more addressable RAM or no more assembly no-ops that are suitable for squeezing a bit more functionality in. What happens when someone like Boeing are asked to upgrade the MCAS software? Not so much an unwillingness to do so but the legitimate question, "where is all this new code supposed to go"?
What to do?
I don't know enough about any of this to give the answers but as a software engineer, I wondered whether we still don't have a formal enough programming language that we could rely on to produce safety-critical code without worrying about edge cases, out-of-range data or race-conditions. Code that says that we can so tersely describe what these systems do that we can review it, compile it and not expect any errors to arise in use except where we have got the definitions wrong.I suspect what is more likely is that someone will decide to upgrade the 8086 to an 80286 first since this would be the smallest jump. Maybe by the year 2500, we might be on a Pentium, who knows?
I would love to know if anyone knows of suitable safety-critical programming languages that would remove the fear of hardware from these providers and allow them the freedom to be very clear and explicit about what a system should do depending on external conditions.