Legacy systems: problem or resource?

Tuesday morning, 9 AM. After a routine patching session, a long-standing ZFS storage system running Solaris 11 suddenly stops talking to its Windows 10 clients. The culprit is the usual, maddening SMB dialect dance: Windows pushes for SMB 3 on security grounds, while Solaris's native service struggles through the negotiation. Two days of banging my head against the wall – hard – and then the discovery: OpenCSW. A community that maintains updated packages for Solaris where the vendor long since threw in the towel. Updated libraries, sorted dependencies, problem solved. There are volunteers out there patching critical systems better than the official vendor ever did. Worth knowing.

Same film, next scene.

Friday afternoon – because critical migrations always happen out of hours. I'm migrating a system from Red Hat 7 to Red Hat 9. Why? To support the new version of Charon-SSP, the Stromasys emulator that lets SPARC hardware run on x86. All of this to keep alive a virtual machine running Solaris 9, an operating system from 2002 that went end-of-life in 2014. It's a layered structure, each level propping up the one below. One of those classic houses of cards you can't quite understand how it stays balanced.

Welcome to the world of legacy systems. A world where “modernising” often means finding increasingly creative ways to change nothing at all, and where communities and old-school sysadmins are the ones guarding infrastructure that corporations abandoned long ago. Try asking Oracle for Solaris support: they'll laugh in your face.

The numbers

In January 2025, the UK government published a report that should have rattled a few chairs at Westminster. Twenty-eight percent of central government IT systems are classified as legacy – up from 26% in 2023. Estimated productivity losses? Forty-five billion pounds. In 2024, the NHS recorded 123 critical IT system crashes. One hundred and twenty-three.

But wait, because the numbers get even more interesting when you look at the banking sector. COBOL – a programming language dating back to 1959 – still processes 95% of global ATM transactions, 43% of the world's banking systems, and around 3 trillion dollars of commerce every day. Every day. It's estimated there are still 220 billion lines of COBOL code in production.

And Windows XP? The one Microsoft stopped supporting in 2014? Today, 1-2% of internet-connected devices still run it. Sounds small until you realise we're talking about millions of machines. And not your grandad's PC: we're talking about MRI scanners in hospitals, industrial control systems, bank ATMs. Critical devices that can't be updated because the software controlling them only runs on XP, and re-certifying the entire system would cost more than building a new one.

Remember WannaCry in 2017? The ransomware that paralysed 75,000 computers in 99 countries? The NHS was devastated. And do you know how many Windows XP machines the NHS had in 2019 – two years after the attack, five years after end-of-support? 2,300.

At this point in the story one might say “right, the problem is clear: legacy systems are dangerous and need replacing.” And that would be the easy narrative – the one that consultants selling “digital transformation” love, and vendors wanting to sell licences love. What if I told you that a Solaris 11 system, properly isolated in a VLAN, is significantly more stable and secure than a shiny new Ubuntu 24.04 LTS?

Reality, as always, is more complicated.

Problems upon problems

Here's the fundamental issue: we use the word “legacy” as if it meant one thing, when it actually covers at least three completely different situations.

Type 1: Unavoidable legacy Solaris 9 on SPARC hardware controlling industrial machinery. Windows XP on MRI scanners. Systems where hardware and software are inseparable, where an upgrade would require replacing equipment worth millions, where re-certification for medical or industrial use would take years and fortunes. These systems are legacy out of necessity, not negligence. There's no fault here. There's only the reality of a technological ecosystem where certain devices have 20-30 year lifespans and the software controlling them can't be changed without changing everything else.

Type 2: Avoidable legacy CentOS 7, for instance. End of support: 30 June 2024. Available alternatives: AlmaLinux, Rocky Linux, migration to RHEL. Cost of migration? Economically: it depends. In time, resources, learning: enormous. How many CentOS 7 systems are still in production today? Too many. Why? Because nobody wants to pay RHEL licences, because “we'll do it next quarter,” because “there are other important things to deal with,” because “if it ain't broke, don't fix it.” This is legacy by choice – or rather, by inertia. It's an organisational decision, not a technical one.

Type 3: Non-legacy perceived as legacy Take COBOL on modern IBM mainframes. Today's mainframes aren't the ones from the 1970s – they're immensely powerful machines, with dedicated processors, hardware security, 99.999% uptime. The COBOL running on them is the same as ever, but the underlying infrastructure is current. Is the code legacy, or the platform? And if the platform is modern, can we still call it legacy? The distinction is fundamental because it determines the strategy. A Type 1 system needs to be isolated and protected. A Type 2 system needs to be migrated. A Type 3 system needs to be left alone. Try explaining that to a CTO who just finished reading a Gartner report on “legacy modernisation.”

From a thread on TheLayoff:

“FWIW, there's a very good chance that your electronic footprint on any given day has passed through a piece of SPARC equipment running Solaris, and that will continue to happen for a good portion of your lifetime.”

Would you believe me if I told you I've seen original BSD systems with eleven years of uptime?

The real problem isn't the machines

Here we get to the heart of the matter. And the answer will surprise you: the real problem with legacy systems isn't technological. It's human.

Let's talk about the “COBOL Cowboys” – retired programmers called back on consulting contracts when something breaks. They're the last generation that knows how those systems actually work. When they leave, they take decades of undocumented knowledge with them. According to Deloitte, companies have seen a 23% decline in mainframe workforce over the last five years, with 63% of those positions left unfilled. It's not that there's no money to hire – it's that there's nobody to hire. Young developers don't want to learn COBOL. It's “unsexy.” It's “archaic.” It's “boomer stuff.”

From ComputerWeekly:

“The retirement of the generation of experts who possess in-depth knowledge of Cobol systems is leading to a severe knowledge shortage. They have knowledge not only of the Cobol programming language, but also of the specific systems they have worked on and built over the years” – Tijs van der Storm, CWI/University of Groningen

And so we find ourselves in a paradoxical situation: systems processing trillions of dollars a day, managed by people who might die of old age before anyone learns to replace them. Knowledge transfer never happened. Documentation – where it exists – is outdated, incomplete, written in a language nobody understands anymore. And every year that passes, the gap widens.

This is the real legacy problem. Not the systems. The people.

When modernisation fails (spoiler: often)

There's a story that people in the UK know well, but that strangely never comes up when “digital transformation” is being discussed. It's called the National Programme for IT, or NPfIT.

Launched in 2002, it was the largest public sector IT project in British history. The goal? Modernise the entire NHS IT infrastructure. Initial budget: 6 billion pounds. Planned completion: 2010.

In 2011, after nine years of delays, exploding costs, vendors abandoning the project, and a system that simply didn't work, the UK government announced the dismantling of NPfIT. Final estimated cost: over 10 billion pounds. For a system that was never completed.

What went wrong? Practically everything. Top-down decisions made by politicians who didn't understand technology. Rigid contracts with vendors who didn't understand the NHS. Resistance from medical staff who hadn't been consulted. Continuously shifting requirements. Impossible integrations with existing systems.

From TechMonitor:

“A lack of digital and procurement capability within government has led to wasted expenditure and lack of progress on major digital transformation programmes.”

The lesson? “Modernising” is not automatically better than “maintaining.” Sometimes, the legacy system that works is preferable to the modern system that never will. But this lesson, apparently, we haven't learned. Because the dominant narrative remains the same: legacy = bad, modern = good. And consultants keep selling the shiny new thing.

Strategies that actually work

TL;DR: There is no single solution. There's a matrix of options ranging from virtualisation to isolation, from refactoring to API wrapping. The choice depends on the type of legacy, the budget, and the acceptable level of risk.

The Gartner 7Rs (yes, they have a name for everything):

  1. Retire – Switch it off. Only works if nobody's actually using it.
  2. Retain – Keep it as is. Sometimes the best choice.
  3. Relocate – Move it to new infrastructure without changes.
  4. Rehost – “Lift and shift” to cloud. Changes the hardware, not the software.
  5. Replatform – Minimal changes to run on a modern platform.
  6. Refactor – Rewrite parts of the code while maintaining functionality.
  7. Rearchitect – Completely redesign. The riskiest and most expensive.

Virtualisation and emulation For systems on proprietary architectures (SPARC, VAX, Alpha, PA-RISC), solutions like Stromasys Charon emulate the original hardware on x86-64 platforms. The operating system and software don't change – only the iron underneath does. For legacy x86 systems (Windows XP, Server 2003, old Linux), standard virtualisation (Proxmox, VMware, KVM) allows you to “freeze” the environment and keep it running indefinitely. I've seen Proxmox setups running Windows 3.11. I'm not joking.

Network isolation If a system can't be patched, it can at least be isolated. Dedicated VLANs, restrictive firewalls, air-gap where possible. It doesn't fix the problem, but it limits the impact in case of compromise.

API wrapping Put a modern REST layer in front of a legacy system. The mainframe keeps doing what it knows how to do; the outside world talks to the API. This is the strategy many banks use to expose COBOL functionality to mobile applications.

The public sector: a special case

Those who work in the public sector know that the dynamics differ from the private sector in ways that make the legacy problem even more complex.

Multi-year budgets. You can't decide in January to modernise a system and have the money by March. Funding cycles are long, rigid, subject to political priorities that change with every election.

Procurement. Buying software in the public sector is a bureaucratic nightmare. Tenders, compliance requirements, impact assessments, GDPR, accessibility. A purchase that takes a week in the private sector takes months here.

Compliance. Systems handling health, education, or tax data are subject to stringent regulatory requirements. You can't simply “migrate to the cloud” – you have to demonstrate that the cloud complies with an endless list of standards.

Service continuity (which in my view is the core problem). If a private company's system goes down for a day, they lose money. If a system managing national exams, or medical prescriptions, or pension payments goes down, the consequences fall on real people with no alternatives. The risk of downtime during a migration is often simply unacceptable.

And then there's the political dimension. Every government wants to announce its own “digital revolution.” Nobody wants to inherit the previous government's problems. And so projects get started, abandoned, restarted, re-abandoned, in an endless cycle of waste.

NPfIT wasn't an exception. It was the rule.

The uncomfortable question

At this point, the question nobody wants to ask is this: what if some legacy systems were simply… better? Not better in an absolute sense, but better for their specific purpose?

Let me tell you something. I worked for years in environments dealing with large-scale Oracle infrastructure – the company that sells “cloud transformation” and “modern infrastructure” to half the world. And among other things, you know what got managed day to day? Old ZFS storage. Stuff that, on paper, should have been “modernised” years ago. Those machines had been running since before Docker existed, before Kubernetes, before “cloud native” became a term. And they worked. Quietly. Without drama. Nobody was in any hurry to replace them. Why would they be? In pursuit of what advantage, exactly?

The COBOL processing bank transactions has been optimised for sixty years. Every bug has been found and fixed. Every edge case has been handled. Every possible scenario has been tested in production billions of times. It's code that has achieved a kind of perfection through Darwinian evolution. Rewriting it in Python would mean starting from scratch. New bugs. New untested scenarios. Years of instability before reaching the same level of reliability.

And in the meantime? In the meantime, the legacy system keeps working. There's a reason banks aren't in a rush to abandon mainframes. It's not ignorance. It's not laziness. It's that they've done the maths and understood that the risk of the new outweighs the cost of the old. And the old administrators have retired. But this is an uncomfortable truth. It doesn't sell well in PowerPoint presentations. It doesn't generate consulting contracts. It doesn't make tech headlines.

And so we keep talking about “modernisation” as if it were automatically a good thing. As if “new” meant “better.” As if technology had a moral direction.

So what?

Legacy doesn't mean old – it means abandoned. The problem is never technical – it's always organisational. And “modernising” is not automatically better than “maintaining.”

If there's one lesson, it's this: be suspicious of anyone with simple answers to complex problems.

Every time I hear some manager say “we need to automate everything with AI,” I think about the software pachyderms holding up half of critical infrastructure. I think about the time it would take to train a model on COBOL written in 1987 with no documentation. I think about how long it would take to migrate a Java 1.7 system running on Solaris 9. I think about the hours spent reverse-engineering platforms still running Lotus Notes. I think about the costs. I think about the risks. And then I think that those same managers don't have the budget to hire juniors willing – and why should they be, when the IT world is moving in a completely different direction – to learn systems that have been decommissioned for at least thirty years. And I laugh. Bitterly, but I laugh. Then I take a few drops of CBD to calm myself down.

Before talking about artificial intelligence – and those who know me know I'm not against AI at all – perhaps we should make sure that human intelligence doesn't retire, taking years of undocumented knowledge with it. But that, evidently, is a less sexy priority to put on the slides.

Sources and further reading

UK government reports – NAO: “The sustainability of government IT” (January 2025) https://www.nao.org.uk/reports/local-government-financial-sustainability-2025/ – NHS Digital: Infrastructure assessment reports https://www.bma.org.uk/advice-and-support/nhs-delivery-and-workforce/the-future/building-the-future-healthcare-infras

COBOL and mainframes – Reuters: “Banks scramble to fix old systems” (Commonwealth Bank Australia cost analysis) https://www.reuters.com/article/technology/banks-scramble-to-fix-old-systems-as-it-cowboys-ride-into-sunset-idUSKBN17C0CN/ – IBM: “COBOL Modernization” https://www.ibm.com/think/topics/cobol-modernization

Legacy virtualisation – Stromasys: “What are legacy systems” https://www.stromasys.com/resources/what-are-legacy-systems-challenges-benefits/ – Proxmox Forums: discussions on legacy system virtualisation https://forum.proxmox.com/tags/legacy/

Sector analysis – Gartner: 7Rs of Application Modernization https://www.techtarget.com/searchCloudComputing/tip/Use-the-7-Rs-to-develop-an-app-modernization-strategy – Deloitte: Mainframe workforce decline study https://www.deloitte.com/us/en/insights/topics/technology-management/tech-trends/2023/future-mainframe-technology-latest-trends.html – WSJ: How AI Can Rev Up Mainframe Modernization https://deloitte.wsj.com/cio/how-ai-can-rev-up-mainframe-modernization-2e3c1c4a

Case studies: failures – Computer Weekly: “What went wrong with the National Programme for IT” https://www.computerweekly.com/opinion/Six-reasons-why-the-NHS-National-Programme-for-IT-failed – NAO: Post-implementation review NPfIT https://www.nao.org.uk/reports/review-of-the-final-benefits-statement-for-programmes-previously-managed-under-the-national-programme-for-it-in-the-nhs/

Security – WannaCry incident reports https://any.run/malware-trends/wannacry/ – NHS Windows XP audit findings (2019) https://www.verdict.co.uk/windows-xp-nhs/

Discuss...

#LegacySystems #Sysadmin #COBOL #Solaris #Linux #PublicSector #DigitalTransformation #Mainframe #OpenSource #Infrastructure