Why massive legacy applictions are impossible to replace and why they are not!
I was reading today about the IRS continually trying to replace a 60 year old tax computer and continually failing. You can read about some of it here but the song is played in many forms in many places, particularly the heavily regulated domains of banking and commerce, but also in any other instituion that bought into computers from a time that looks entirely different from now!
The scenario is straight-forward. You built a massive system back when you had to order hardware earlier than you started writing softwarte because it was all built to order. You had to count every byte which led to some severe optimisations, which although the stuff of folklore, do little to aid understanding.
The system works well for a decade and although newer better architectures come along, you have invested so much, you are not in a position to just swap out your dinosaur and replace it. Probably, the new system would still require massive bespoke design work and would take 5 years just to plan.
Just like people, the system hits middle age, perhaps 20 years later and not only is it starting to show health problems but the IT world has changed significantly. Mainframes as so "last decade" and now it is all about the personal computer. This can be ignored. PCs are personal right? They aren't designed for massive systems like tax so we can stay content that by throwing a bit more hardware at the current system, it still works mostly.
Of course, you know the end of the story, the system is now a Senior Citizen with a free bus-pass and is really showing its age. Hardware repairs, if they are even possible, are extortionately expensive, the people who understood the system as originally designed are mostly long-gone, we now have the internet as the world's unifying force and there's no way that the legacy system can deal with that natively, at best we would need a web application to front it but how?
We understand how we got here. At what point do you decide that the 10s or 100s of millions you are spending each year, just to keep the software system alive, is good money after bad? Even if you did, how do you even start a new project to replace it? At best, you would need the promise of significant extra money to bankroll the new system until the old system could be switched off. You have the terror of the multi-million pound IT project which inevitably makes lawyers and company directors rich and even if you agree this is necessary, you have all of the compounding effects:
Regulators: Are they friendly? Are they motivated to make you succeed or simply hit you if your new system isn't exactly correct?
Industry lobbiests: This is likely to disrupt existing supply chains, if the current supplier is a large multi-national, are they going to lobby to try and scupper the upgrade to avoid losing millions of dollars a year? Are they going to go to Court?
The public: If your system is public facing, don't underestimate the problem with a Media outlet telling the public that this will ruin everything and they should object. More noise, more PR required, perhaps even more changes to the design to make people happy.
Manuals: Do you even have them? If you do, are they like 20000 pages long? Who is going to read them to understand what is functionally important and what is an implementation detail?
Port or rewrite: How do I balance the need to modernise with the relative ease of copying what is already known to work? There are many times where a bug covers another bug and happens to work, when this is removed, the original code doesn't actally work: so it can't be coped....
And the last and definitely always the worst enemy of large projects is the number of people involved. I do honestly (and perhaps naively) believe that you could build even a massive system if you had 10 really able developers and a couple of managers in much less time for much less money because the communication lines are reduced and hold-ups removed, something that we call Agile in the programming community but which cannot exist in most corporates because they are based on Command-and-Control, even when the commanders are not adding any value, just delay.
So on the face of it, the problem is impossible or at least ridiculously expensive. Can you imagine the IRS spending billions (yes billions) on modernising and still having to live with the original core code written in 1960 from assembly because it is too hard to port and they don't know what it does.
So if it is impossible, how can it be made possible?
A different approach is needed right at the top level of management. We can deal with the 90% problem very quickly. In various forms, this says that 90% of the problem is relatively easy and would be quick to solve. In terms of the IRS, the 90% (I am making these numbers up) might be the number of Americans who only have 1 or 2 jobs and no other income. Who don't offset any purchases against their tax and therefore who have a very simple tax calculation.
What do you do with this 90% problem? You create an architecture that can handle the volume of people i.e. some kind of elastic system or cloud, and then you create a relatively simple web application that sits alongside the current system.
You now have a duplication issue but you also have a number of options. The duplication issue itself isn't necessarily a problem, they probably already run lots of systems side-by-side but the real issue is things like reporting or any tool that currently hooks into the single tax system that might e.g. compute how much the federal government is owed and which cannot easily get its data from two places (or maybe it would be annoying).
Now it would be tempting to simply push the new data into the old system so all of the existing reporting tools work but this will not drive you forwards because it will reinforce the need for the old system rather than removing the need for it. Instead, we create a proxy that can merge data from the old and the new systems to create the data needed for reporting. This new proxy is annoying because it feels awkward and is like adding a 3rd component when we already have 2 but the point is, it should be! The proxy should be awkward but this buys you two things: Firstly it removes the need to continue to work on the old system significantly because the new 90% data doesn't even go into the old system any more. Secondly, the awkwardness helps drive you forwards to the time when you fix your apps to access each system independently so that you won't need the proxy to do it any more (that might only happen once the old system is completely killed).
From the public ponit of view, who wouldn't want a super simple tax return system that works really quickly etc and which could provide some very rich reporting just by using an up to date archiecture.
Now when you developed the new system, you already had to deal with certain things like a partial year of data, like different tax codes, like different states so you have already built part of the foundation to start moving in more functionality. Perhaps you do some work to work out what most of the remainder of people need and add in features by popularity. Perhaps the next feature is around rental income, well that's relatively easy because you designed your pages to be dynamic right? You add another option, perhaps the user chooses what they want when they enter the system and now even more data goes into the new system.
This will, of course, cause you to keep the legacy system for reporting for however many years it needs to be but, of course, it could be another project to read existing data into a read-only system and then wind down the legacy system to keep ensuring it is running but without the large overheads and risks.
A large factor that would also be hard is to convince the powers that be that you cannot and should not attempt to recreate what the old system did. You need to know how the tax calculations work, you need to reimplement them and you need a mechanism to deal with any times that someone queries the figures and claims that the old system gave a different value.
Oh, and this only works when you have a deliberate effort to keep people away from the project instead of inviting everyone in. Agile also teaches about stakeholders vs the development team but the same is true in design and architecture. I shouldn't have to justify every decision with 20 people from the Board, most decisions should be headline: My estimate is that it will cost $X to run each month should be "yes/no" not "let us discuss this figure and waste my time".
The FBI did something with agile, maybe they should bring in the same team.
The scenario is straight-forward. You built a massive system back when you had to order hardware earlier than you started writing softwarte because it was all built to order. You had to count every byte which led to some severe optimisations, which although the stuff of folklore, do little to aid understanding.
The system works well for a decade and although newer better architectures come along, you have invested so much, you are not in a position to just swap out your dinosaur and replace it. Probably, the new system would still require massive bespoke design work and would take 5 years just to plan.
Just like people, the system hits middle age, perhaps 20 years later and not only is it starting to show health problems but the IT world has changed significantly. Mainframes as so "last decade" and now it is all about the personal computer. This can be ignored. PCs are personal right? They aren't designed for massive systems like tax so we can stay content that by throwing a bit more hardware at the current system, it still works mostly.
Of course, you know the end of the story, the system is now a Senior Citizen with a free bus-pass and is really showing its age. Hardware repairs, if they are even possible, are extortionately expensive, the people who understood the system as originally designed are mostly long-gone, we now have the internet as the world's unifying force and there's no way that the legacy system can deal with that natively, at best we would need a web application to front it but how?
We understand how we got here. At what point do you decide that the 10s or 100s of millions you are spending each year, just to keep the software system alive, is good money after bad? Even if you did, how do you even start a new project to replace it? At best, you would need the promise of significant extra money to bankroll the new system until the old system could be switched off. You have the terror of the multi-million pound IT project which inevitably makes lawyers and company directors rich and even if you agree this is necessary, you have all of the compounding effects:
Regulators: Are they friendly? Are they motivated to make you succeed or simply hit you if your new system isn't exactly correct?
Industry lobbiests: This is likely to disrupt existing supply chains, if the current supplier is a large multi-national, are they going to lobby to try and scupper the upgrade to avoid losing millions of dollars a year? Are they going to go to Court?
The public: If your system is public facing, don't underestimate the problem with a Media outlet telling the public that this will ruin everything and they should object. More noise, more PR required, perhaps even more changes to the design to make people happy.
Manuals: Do you even have them? If you do, are they like 20000 pages long? Who is going to read them to understand what is functionally important and what is an implementation detail?
Port or rewrite: How do I balance the need to modernise with the relative ease of copying what is already known to work? There are many times where a bug covers another bug and happens to work, when this is removed, the original code doesn't actally work: so it can't be coped....
And the last and definitely always the worst enemy of large projects is the number of people involved. I do honestly (and perhaps naively) believe that you could build even a massive system if you had 10 really able developers and a couple of managers in much less time for much less money because the communication lines are reduced and hold-ups removed, something that we call Agile in the programming community but which cannot exist in most corporates because they are based on Command-and-Control, even when the commanders are not adding any value, just delay.
So on the face of it, the problem is impossible or at least ridiculously expensive. Can you imagine the IRS spending billions (yes billions) on modernising and still having to live with the original core code written in 1960 from assembly because it is too hard to port and they don't know what it does.
So if it is impossible, how can it be made possible?
A different approach is needed right at the top level of management. We can deal with the 90% problem very quickly. In various forms, this says that 90% of the problem is relatively easy and would be quick to solve. In terms of the IRS, the 90% (I am making these numbers up) might be the number of Americans who only have 1 or 2 jobs and no other income. Who don't offset any purchases against their tax and therefore who have a very simple tax calculation.
What do you do with this 90% problem? You create an architecture that can handle the volume of people i.e. some kind of elastic system or cloud, and then you create a relatively simple web application that sits alongside the current system.
You now have a duplication issue but you also have a number of options. The duplication issue itself isn't necessarily a problem, they probably already run lots of systems side-by-side but the real issue is things like reporting or any tool that currently hooks into the single tax system that might e.g. compute how much the federal government is owed and which cannot easily get its data from two places (or maybe it would be annoying).
Now it would be tempting to simply push the new data into the old system so all of the existing reporting tools work but this will not drive you forwards because it will reinforce the need for the old system rather than removing the need for it. Instead, we create a proxy that can merge data from the old and the new systems to create the data needed for reporting. This new proxy is annoying because it feels awkward and is like adding a 3rd component when we already have 2 but the point is, it should be! The proxy should be awkward but this buys you two things: Firstly it removes the need to continue to work on the old system significantly because the new 90% data doesn't even go into the old system any more. Secondly, the awkwardness helps drive you forwards to the time when you fix your apps to access each system independently so that you won't need the proxy to do it any more (that might only happen once the old system is completely killed).
From the public ponit of view, who wouldn't want a super simple tax return system that works really quickly etc and which could provide some very rich reporting just by using an up to date archiecture.
Now when you developed the new system, you already had to deal with certain things like a partial year of data, like different tax codes, like different states so you have already built part of the foundation to start moving in more functionality. Perhaps you do some work to work out what most of the remainder of people need and add in features by popularity. Perhaps the next feature is around rental income, well that's relatively easy because you designed your pages to be dynamic right? You add another option, perhaps the user chooses what they want when they enter the system and now even more data goes into the new system.
This will, of course, cause you to keep the legacy system for reporting for however many years it needs to be but, of course, it could be another project to read existing data into a read-only system and then wind down the legacy system to keep ensuring it is running but without the large overheads and risks.
A large factor that would also be hard is to convince the powers that be that you cannot and should not attempt to recreate what the old system did. You need to know how the tax calculations work, you need to reimplement them and you need a mechanism to deal with any times that someone queries the figures and claims that the old system gave a different value.
Oh, and this only works when you have a deliberate effort to keep people away from the project instead of inviting everyone in. Agile also teaches about stakeholders vs the development team but the same is true in design and architecture. I shouldn't have to justify every decision with 20 people from the Board, most decisions should be headline: My estimate is that it will cost $X to run each month should be "yes/no" not "let us discuss this figure and waste my time".
The FBI did something with agile, maybe they should bring in the same team.