Friday 12 January 2024

I worked on the Post Office Horizon project but I couldn't spill the beans, even if I had beans to spill

So first of all, yes, I worked on the Horizon project. I didn't work for the Post Office, I worked with a partner company and was involved in extensive testing of our systems with the Horizon system. I can say this as it's on my CV and is no secret. I worked as a freelance IT contractor and had responsibility for much of what we called integration and performance testing between our system and the Post Office systems. For clarity I didn't work on any of the systems that went wrong. That is really all I can say. 

I can't say any more, as I signed a non disclosure agreement when I commenced work on the project and was reminded of this by the legal bods, when they realised I was a prolific blogger. Having said that, there are many aspects of what has happened that anyone who has ever worked on a large scale IT project should have concluded. Anyone who has worked on a large project, would end up concluding the same. 

Firstly, let’s talk about the issue of client confidentiality. Any financial institution handling money will require staff to sign up to prevent client and customer confidentiality being compromised .This is a perfectly reasonable thing for an organisation to do. There are many extremely commercially sensitive elements of the job and it would be almost impossible for companies to implement projects, if every time someone got a beef, they contacted the press. Every IT system I worked on, and this included for Nat West, Lloyds, LINK and BACS has problems. It is highly unusual for these to impact people and generally when they do this is recognised and fixed, and those people who are out of pocket are properly re-imbursed. Billions of electronic transactions happen every day in the UK, every time you tap your card, a transaction occurs and it is almost unknown for that to fail. These systems are incredibly reliable. But sometimes, they do go wrong. Usually, the problems are caused by the sheer volume and growth in the systems. What should happen when a customer reports a problem, is that an IT support team should investigate it. It should be fairly easy to identify that something has gone on. Fixing it is a different matter, but once a problem occurs, it should be logged and investigated and if the same thing happens again, then the priority of fixing it increases. When a problem is logged, should a customer experience a problem, you will check whether the symptoms correlate with a known bug.

There has been a lot of talk of Horizon and what went wrong. I am surprised that no one has offered an explanation of what the issue actually was. Lets start by looking at how the situation occurred.

The system was brought in by the Post Office to be a single entity to manage the business requirements of running the post office business. When you go into a post office and send a parcel, buy some sweets, take money out of your account over the counter etc, the system you interact with, that takes the transaction and gives you a receipt was Horizon. It replaced a hotch potch of older systems, some manual such as giro cheques for benefits. The company that supplied it was Fujitsu. The company had a decades long relationship with the Post Office. It had previously been ICL, the UK computer company. Fujitsu bought the ICL business in 1990. As a British company, ICL was the IT supplier of choice for the UK government. Fujitsu inherited this preferred supplier status. The UK organisation that the Post Office dealt with was pretty much the same on that had been ICL previously. I doubt too many people would have an issue with a UK government company dealing with a UK company. Of course, there is the issue of senior executives giving money to the Tory party. This does fail the sniff test, but it must be said that the contract was signed under Labour and ICL/Fujitsu was a preferred supplier under both parties. In fact, I am more than happy for the Government to give business to UK based companies.

As to why the Horizon programme came into being. This would have happened regardless of the government, unless a Tory government had broken up the Post Office, which I doubt anyone would have wanted. Maintaining a system that has been cobbled together and evolved over the years is expensive and hard to maintain. The new system offered the Post Office, which had huge financial challenges, a massive cost saving. Executives were under pressure from the government to get the system in. Horizon was a key plank of the savings. I must add that I wasn't involved at a senior level, so all of this is not privileged knowledge, it is a statement of fact, that anyone with a knowledge of such things and an interest could conclude. The Post Office has been under massive political pressure for decades to save cash. Executives were terrified that if they failed to save money, the organisation would be broken up. The management of the IT teams implementing Horizon knew that the future of the Post Office was very much reliant on a successful implementation of Horizon and the savings it would deliver. A failure to deliver may have been the end of the Post Office as we know it. This pressure is the real root cause of the whole mess.

Fujitsu developed the system and it was delivered for testing (this predated my time on the project).  The Post Office, as the client had a responsibility to test the software properly. There would have been several phases. System testing, where the components are tested with data that mimic real life situations are tested. The Post Office IT business analysts would (or should) have identified every scenario that can occur and written scripts to exercise the code. Once all of this was done, there would be integration testing. This is where all the components are joined up and end to end testing happens. So the parts that link the till functions are linked to accounting systems. Again full test plans of all scenarios are executed. The next phase of this testing is performance testing, where the systems have loadings that simulate real life operations are tested. Once the software has been signed off as working, there is user acceptance testing, where people who work in the business use the system, to ensure it meets their requirements and useable by normal people and not just IT geeks, and operational testing, where the IT teams make sure it is all manageable.

 All of these phases have to be completed successfully before the system can be put live. 

In each phase, defects with the software will be identified. Some of these are deemed cosmetic and the system can be put in with these. A cosmetic error is one where customers are not impacted, such as where a screen item should be in bold and isn't etc. Then there are defects which may impact a customer, but may still be considered to be OK to proceed. It may be something along the lines of information on a till receipt not being displayed as it should be, but still being correct. Then there are what are known as showstoppers. These are the problems where a customer is impacted (for the absence of doubt, we will consider Postmasters as customers of the Horizon system). This typically means that there will be a financial impact. All of the above feeds into a process known as acceptance testing. Once the criteria to put a product live has been met, the senior management sign off the test results and the system can be put in.

What surprised me about the discussion, is that there has been no discussion at all as to whether the Horizon system passed these tests, or whether serious defects were identified and the system signed off with defects, on the basis that the problems were known and could be managed. This does happen occasionally, especially if the new system reduces similar errors in an older system. I would have thought this would be the key. Of course, IT business analysts cannot identify every scenario. Sometimes equipment doesn't behave in the field as it is supposed to. Sometimes the people operating it do not properly understand how to operate it, as they are complex and require a degree of training. There are very few systems that have no errors or defects at all, but most are minor and happen only when weird and whacky things happen.  

Typically, a new system has a warranty period from the suppliers (Fujitsu). Defects are fixed for free during this period. After that, the system goes into a period of paid support. What we no know, is  that Horizon had serious defects, that caused tens of thousands of pounds to 'disappear'. Up to this point, I can understand what happened. Sometimes, when things go live, scenarios that were not predicted happen. 

What I can't understand is what happened next. The system clearly had a massive defect. This is not unique, it happens quite regularly when new software is implemented on all manner of systems. There have been many documented in the press. What doesn't happen is that there is no effort at all to identify it. I have worked on all manner of financial systems. The first time a Postmaster was 'investigated',  a full audit of the post office should be done. This should mean that every single transaction should be checked by an auditor. Quite rightly, the Postmaster should sit down with the auditor and all of these verified. The sad truth is that people do nick money, but it is almost always easy to spot. When the Post Office started to see hundreds of postmasters being flagged up as performing fraud, there most certainly should have been a forensic investigation. The idea that anyone in IT would believe a new, highly complex system to be infallible is ridiculous. What puzzles me, in all the coverage I've seen,  no one has offered an explanation of how the cash disappeared from the system. 

Were multiple debits for the same transaction appearing on the system? Were huge single transactions being spuriously generated by the system? It is all bizarre. Post Offices have cameras, surely these were checked? The thing is, that whatever happened, no one should end up in court for fraud, without a proper audit of what has happened in their Post Office where the alleged fraud occurred. It cannot be the case that a system said "This person nicked £75,000" and that was the sole evidence. There are till rolls that can be tallied etc. Was it the same problem for all of the victims, or was it a myriad of problems?

I wasn't going to write this blog, but I watched the testimony of the investigator yesterday. I was horrified. It seems to me that rather than going through and forensically producing a list of dodgy transactions and asking for an explanation, a highly aggressive form of interview was conducted, where there was pressure put on innocent people to explain something that they clearly couldn't.

I would have expected that as soon as a pattern of unexplained fraud came to light, the IT team should have taken a look at what was going on and identified the problem. Another thing I don't understand is that when people have conducted large scale fraud, there are tell tale signs. People generally don't conduct fraud and simply stash the cash under their bed. They spend it. Be it on a new car, a new house, their mistress, the bookies, drugs. There is nearly always some sort of giveaway. Of course, investigators are told that the system has the poor sod who has been misidentified bang to rights, but a competent investigator should keep a degree of an open mind and look for some sort of evidence that there has been a lifestyle change. 

As the number of people affected started to spiral, the alarm bells should have rung. No one, to this day, has said a word as to what the problem was, when it was actually identified as a software problem and whether there was a forensic investigation of those wrongly charged, as to whether they were a victim.

So where did it go wrong. The politicians are all saying that the Post Office lied to them and they had no idea of the scale. I have a degree of sympathy with this. Although 700 postmasters were identified, it is unlikely, given that we have more than 600 MP's that any MP had more than one or two constituents affected. The MP's would have initially thought that whatever happened was a one off, especially of the Post Office assured them that was the case. When the scale became apparent, the Government, as the chief shareholder, should have demanded an explanation. By the time this came to pass, it is likely that the Post Office team that implemented Horizon had been redeployed, people had left etc. I do wonder what happened to the test results, defect logs etc, from the acceptance testing phases. I wonder what happened to the logs from the Post Offices where fraud cases were brought. I wonder whether people in IT support knew, but had no effective whistle blowing process. I wonder if senior managers, under pressure from Government, to save taxpayers cash, lied so the the PO could achieve the savings that Horizon promised.

I have seen a lot of criticism of Fujitsu. I believe this is unfair. They should not have had access to the live data. If the Post Office new of problems, they should have managed them. Fujitsu would not have access to the data that caused the problem, as financial institutions do not share real customer data with suppliers. If the Post Office knew there were bugs, they should have managed the problem and not sent people to prisons, it was there problem alone. You may not like companies that donate money to the Tories, but if the Post Office managed their IT systems properly, this simply wouldn't have happened. I have no links to Fujitsu, but my understanding of the IT development process, to my mind exonerates them.

To my mind, there are three causes for this.

1) The Post Office IT management were put under unreasonable political pressure to save money, resulting in a breakdown in the accepted process of managing IT change.

2) The Civil Servants charged with oversight of the Post Office were too incompetent to spot the red flags that should have been waved when hundreds of Postmasters started to get prosecuted.

3) Parliament is not fit for purpose in the 21st Century to deal with victims of corporate IT failure. MP's lack expertise and understand of the complexity of systems and there is no proper support that they can call on. If Parliament had and expert team that MP's could call on to advise, then once dozens of MP's started reporting issues in their constituencies, then this would have been identified far more quickly.

Bear in mind that this is not the first major failure in a Government IT project. The vast majority are late and go over budget. I don't think this is always the fault of the IT firms. In my opinion, it is due to the government not allocating proper budgets in the first place. IT companies deliberately underbid, knowing that once they get the job, they can hike prices up. In short, this results in shoddy developments. To some extent, the same is true for other infrastructure projects such as HS2. Underbidding and then ramping up costs and timelines mean things end up costing more than if done properly in the first place. 

In my opinion the Horizon fiasco is only a symptom of the utter failure of the UK to manage large infrastructure projects. Until Parliament gets to grips with this, we will see this happen time and again, costing money, ruining lives and causing scandals that we are outraged for a week or two about, and then we forget until the next scandal. 


5 comments:

Ham said...

ElReg have certainly identified specific issues over the years, the simplest one appears to have been when a cash input screen froze and the user hit the "enter" button, the cash would be added each time, without any visibility, creating a cash shortfall. There are other specific faults, too, apparently often connected with comms outage.

Like you, I struggle to see how this could have been so thoroughly ignored, and how Fujitsu representatives could have stood up in court and backed this head-in-the sand approach.

Anonymous said...

Thanks for the insight into how these projects are meant to be managed.
Sadly the drama clearly showed a Fujitsu employee altering figures on a live account. In order to demonstrate to the union guy how the system worked.
So my question to you Roger if I may , is it normal practice for an IT company to mess with live accounts others are responsible to ensure balance at the end of the business day.

Rog T said...

Is it 'normal practice'? That is actually an impossible question to answer. If the facility is there and you are supporting a system, then yes, a support team would be expected to know how to correct errors, every system has a back door to correct data for support teams when things go wring. This is usually protected by high level passwords and requires a full change control authorisation and a related incident to fix. If this wasn’t the case, then the PO didn’t have appropriate controls in place. I don’t know of any organisation anywhere that gives suppliers access to live data without stringent controls in place. The potential is only there when the Post office business grants access to the system.

Everyone I know who has worked on large systems at some point has 'altered' live data following serious issues. Staff need to be trained in how to do this, but it is something that is very much last resort, and done with a avalanche of bureacracy and 'four eyes checks'. I haven't seen the video, but if this was done routinely without checks then it is a failure of governance by the Post Office.

Rupert Lloyd Thomas said...

The judge at the Post Office IT Inquiry has stated that there will be no retribution for those providing evidence? Therefore the Non-Disclosure Agreement is void? RGDS RLT

Rog T said...

Rupert, the NDA was not with the Post Office, but with a supplier. I was givena very stern lecture by their lawyers at the time they realised I wrote a blog that they would come after me if I ever said anything publicly about it. Having said that, all I could really add was a general observation that the culture was 'not right'. I think competent lawyers who know the IT development process should have nailed this years ago, just by checking test and incident logs.