The Post Office Horizon IT scandal, part 1 – errors and accuracy

For the last few years I’ve been following the controversy surrounding the Post Office’s accounting system, Horizon. This controls the accounts of some 11,500 Post Office branches around the UK. There was a series of alleged frauds by sub-postmasters, all of whom protested their innocence. Nevertheless, the Post Office prosecuted these cases aggressively, pushing the supposed perpetrators into financial ruin, and even suicide. The sub-postmasters affected banded together to take a civil action against the Post Office, claiming that no frauds had taken place but that the discrepancies arose from system errors.

I wasn’t surprised to see that the sub-postmasters won their case in December 2019, with the judge providing some scathing criticism of the Post Office, and Fujitsu, the IT supplier, who had to pay £57.75 million to settle the case. Further, in March 2020 the Criminal Cases Review Commission decided to refer for appeal the convictions of 39 subpostmasters, based on the argument that their prosecution involved an “abuse of process”. I will return to the prosecution tactics in my next post.

Having worked as an IT auditor, including fraud investigations, and as a software tester the case intrigued me. It had many features that would have caused me great concern if I had been working at the Post Office and I’d like to discuss a few of them. The case covered a vast amount of detail. If you want to see the full 313 page judgment you can find it here [PDF, opens in new tab].

What caught my eye when I first heard about this case were the arguments about whether the problems were caused by fraud, system error, or user error. As an auditor who worked on the technical side of many fraud cases the idea that there could be any confusion between fraud and system error makes me very uncomfortable. The system design should incorporate whatever controls are necessary to ensure such confusion can’t arise.

When we audited live systems we established what must happen and what must not happen, what the system must do and what it must never do. We would ask how managers could know that the system would do the right things, and never do the wrong things. We then tested the system looking for evidence that these controls were present and effective. We would try to break the system, evading the controls we knew should be there, and trying to exploit missing or ineffective controls. If we succeeded we’d expect, at the least, the system to hold unambiguous evidence about what we had done.

As for user error, it’s inevitable that users will make mistakes and systems should be designed to allow for that. “User error” is an inadequate explanation for things going wrong. If the system doesn’t help users avoid error then that is a system failure. Mr Justice Fraser, the judge, took the same line. He expected the system “to prevent, detect, identify, report or reduce the risk” of user error. He concluded that controls had been put in place, but they had failed and that Fujitsu had “inexplicably” chosen to treat one particularly bad example of system error as being the fault of a user.

The explanation for the apparently inexplicable might lie in the legal arguments surrounding the claim by the Post Office and Fujitsu that Horizon was “robust”. The rival parties could not agree even on the definition of “robust” in this context, never mind whether the system was actually robust.

Nobody believed that “robust” meant error free. That would be absurd. No system is perfect and it was revealed that Horizon had a large, and persistent number of bugs, some serious. The sub-postmasters’ counsel and IT expert argued that “robust” must mean that it was extremely unlikely the system could produce the sort of errors that had ruined so many lives. The Post Office confused matters by adopting different definitions at different times, which was made clear when they were asked to clarify the point and they provided an IT industry definition of robustness that sat uneasily with their earlier arguments.

The Post Office approach was essentially top down. Horizon was robust because it could handle any risks that threatened its ability to perform its overall business role. They then took a huge logical leap to claim that because Horizon was robust by their definition it couldn’t be responsible for serious errors at the level of individual branch accounts.

Revealingly, the Post Office and Fujitsu named bugs using the branch where they had first occurred. Two of the most significant were the Dalmellington Bug, discovered at a branch in Ayrshire, and the Callendar Square Bug, also from a Scottish branch, in Falkirk. This naming habit linked bugs to users, not the system.

The Dalmellington Bug [PDF, opens in new tab – see see para 163+] entailed a user repeatedly hitting a key when the system froze as she was trying to record the transfer of £8,000 in cash from her main branch to a sub-branch. Unknown to her each time she struck the key she was confirming dispatch of a further £8,000 to the other office. The bug created a discrepancy of £24,000 for which she was held responsible.

Similarly, the Callendar Square Bug generated spurious, duplicate financial transactions for which the user was considered to be responsible, even though this was clearly a technical problem related to the database, the messaging software, the communications link, or some combination.

The Horizon system processed millions of transactions a day and did so with near 100% accuracy. The Post Office’s IT expert therefore tried to persuade the judge that the odds were 2 in a million that any particular error could be attributable to the system.

Unsurprisingly the judge rejected this argument. If only 0.0002% of transactions were to go wrong then a typical day’s processing of eight million transactions would lead to 16 errors. It would be innumerate to look at one of those outcomes and argue that there was a 2 in a million chance of it being a system error. That probability would make sense only if one of the eight million were chosen at random. The supposed probability is irrelevant if you have chosen a case for investigation because you know it has a problem.

It seemed strange that the Post Office persisted with its flawed perspective. I knew all too well from my own experience of IT audit and testing that different systems, in different contexts, demanded different approaches to accuracy. For financial analysis and modelling it was counter-productive to chase 100% accuracy. It would be too difficult and time consuming. The pursuit might introduce such complexity and fragility to the system that it would fail to produce anything worthwhile, certainly in the timescales required. 98% accuracy might be good enough to give valuable answers to management, quickly enough for them to exploit them. Even 95% could be good enough in some cases.

In other contexts, when dealing with financial transactions and customers’ insurance policies you really do need a far higher level of accuracy. If you don’t reach 100% you need some way of spotting and handling the exceptions. These are not theoretical edge cases. They are people’s insurance policies or claims payments. Arguing that losing a tiny fraction of 1% is acceptable, would have been appallingly irresponsible, and I can’t put enough stress on the point that as IT auditors we would have come down hard, very hard, on anyone who tried to take that line. There are some things the system should always do, and some it should never do. Systems should never lose people’s data. They should never inadvertently produce apparently fraudulent transactions that could destroy small businesses and leave the owners destitute. The amounts at stake in each individual Horizon case were trivial as far as the Post Office was concerned, immaterial in accountancy jargon. But for individual sub-postmasters they were big enough to change, and to ruin, lives.

The willingness of the Post Office and Fujitsu to absolve the system of blame and accuse users instead was such a constant theme that it produced a three letter acronym I’d never seen before; UEB, or user error bias. Naturally this arose on the claimants’ side. The Post Office never accepted its validity, but it permeated their whole approach; Horizon was robust, therefore any discrepancies must be the fault of users, whether dishonestly or accidentally, and they could proceed safely on that basis. I knew from my experience that this was a dreadful mindset with which to approach fraud investigations. I will turn to this in my next post in this series.

13 thoughts on “The Post Office Horizon IT scandal, part 1 – errors and accuracy

  1. Mr Christie, your overview has supreme value. Your forensic approach to the Fujitsu/Post Office scandal throws light upon what is perhaps becoming one of the most critical questions of our time. What level of testing and verification should be demanded of organisations before a system is rolled out/signed off into live environments upon an unsuspecting public?. Example – I vistited Heathrow Airport in January 2023 dropping off a colleague. The telephony ‘dial up’ system offered (on ‘drop off zone’ signposts) advised calling a number & inputting credit card and vehicle registration details using voice recognition. The vehicle reg. I attempted to submit has a ‘double character’ at the end: ‘V … V’ The system refused to register/accept the last ‘V’, even after multiple attempts. I had no choice but to omit the final reg letter. Even though the system accepted my debit card payment, I received a penalty ticket for ‘non-payment’ ! An untested system flaw, in a live public environment in Jan 2023. If organisations are not held to account when anonymous individuals lob partially proven systems upon the public, we are all most surely on the road to hell?. My inconvenience amounts to nothing when compared to the victims of the Post Office outrage, but overarching issues with testing standards remain outstanding and continue to pose a significant threat.

    • What a strange bug! Validating car registration numbers should be a routine and straightforward task. Failing to do that properly suggests shoddy development as well as testing practices.
      And thank you for the compliment. I appreciate it.

  2. I worked for the London Stock Exchange for ten years as a testing analyst. We, as a ‘testing’ team were tasked with building test cases to check that the intended code enhancements or additions were as watertight as possible, prior to ‘going live’. The LSE settled all the share transactions in the UK (and Australia) overnight in a batch system in PL/I on an IBM mainframe and as may be imagined, accuracy was a bit important!

    The testing was, to say the least, thorough! There were several levels of testing and the actual software tests were designed on paper in direct response to the business analyst’s specifications, written (on paper), reviewed, corrected if necessary and only then, when the tests were signed off by other testers/analysts, did the coding start.
    It went, without saying, that the coder was a different person from the testing analyst!

    Once the coding amendments were made, the coder would test their work at the level of the changed module, using a separate test system, documenting their checks, the results being checked against the specification, by a third party.

    This same process went on, through the module, programme, unit, sub-system, system, user level and only then, when all the designed tests were deemed satisfactory and complied with the original business analysts design, would the code ‘go live’ on a prescribed day when preparations were made to ‘back out’ the code if anything was seen to fail. It very, very rarely did!

    Yes, it took time. Yes, it was probably expensive but, yes, there was always a software version trail and good documentation stored in a library.

    I understand that Horizon is/was a real-time, on-line system but clearly the company that built it didn’t test it! Unforgivable! . . . especially when such large sums of money were involved!

    I’m not blowing my own trumpet here but it makes me sick and angry that the intransigence of the commissioning organisation caused such pain and devastating life-changing events for so many innocent end users.

    There is (at least) one head that should roll for this debacle!

    • Well said. Building and testing serious financial systems requires rigour and serious, responsible people. The Post Office Inquiry has revealed that Fujitsu and the Post Office failed abysmally, and they failed by the standards of the time in the late 1990s. We knew better then. Criticism is not hindsight.

  3. With the Dalmellington bug as an example the key was pressed 3 times generating £24,000. Why was there no crosscheck on actual amount of cash which would have left another account as £8,000. The other £16,000 never existed. How could a financial system be that far out?

      • Around 2008 I had to work on a team tasked with constructing an operational baseline for the systems in use by the PO as part of an enterprise IT strategy consultancy. Horizon didn’t operate in a vacuum, there were 3 separate SAP systems (yes 3) amongst other messaging and integration systems, all interconnected and handling transactional and master data. It was a bit of a mess tbh and badly needed rationalisation – hence the IT strategy work. I don’t know where transaction reconciliation happened but any analysis probably needs to cover the full system of systems as errors could occur in multiple places. From memory Horizon itself featured a programmable keyboard mapped to the dozens of products and services sold which was driven by a nightly masterdata upload containing any price and parameter changes. Again there’s scope for human error there on data prep but you would expect a wider impact. I was interested to read about the 2 known bugs you mention. Those should have been analysed and fixed as soon as reported given the impact. Have to wonder what was happening there on the Support team.

    • This was alluded to in the ITV programme when one of the victims asked that question and was told by the external expert who had reviewed the system issues for the post office that the ‘money’was probably sitting in a suspense account on the post office side and then written off as immaterial to profit at some point.

      As James has mentioned above this is a shocking approach if it is what has happened because the post office must have been aware that they were adjusting their cash accounts up all the time in their own favour and at the very material cost to their sub postmasters. The concept of materiality in the case of a related party cannot just consider the company’s own position but also that of the counterparties.

  4. I was an IT consultant (also former accountant) for 40 years mainly implementing large accounting system latterly with the world’s second largest software company.

    The description of the process of testing that Jon Warren describes is absolutely correct and common best practise. It would appear from what has been disclosed in the press that the Post Office as the responsible user did not test the test system thoroughly enough and fix the bugs (certainly the critical bugs) before signing off the Acceptance Testing Phase before going live. And once live did not investigate fully errors being reported by their key users.

    I am not surprised that yet another IT system in the public sector has failed. I was once asked by UK Government Department to put into production a system that in my opinion hadn’t been properly tested. I refused and spoke to my boss and a conversation / letter was soon received by the civil servants about legal consequences if they went live without proper testing.

  5. What was the other side of the entry? If it wasn’t cash then there should be a bank entry? Should be relatively easy to see 3x£8,000 it was an error??

    • Horizon was a point-of-sale system not an accounting package so didn’t work like that I think. Main role was to capture transactions, the issue seems to have been with reconciliation of recorded transactions against takings and trying to match all of that against all of the individual customer’s bank accounts to see if they had a total of matching payments to the post office would not have been viable. Also requires access to their accounts.

      • I’ve amended the description of the Dalmellington Bug. The Daily Mail story I used when I wrote this piece in 2020 got the principles right, but not the detail. The subpostmistress was trying to record the transfer of cash to a sub-office, not acknowledge receipt from head office. The result was the same, a discrepancy for which she was held responsible.
        I can’t see how Post Office investigators would have thought this was evidence of any sensible way to commit fraud. That’s a common pattern. Fraud was not a reasonable explanation to any open-minded investigator. System problems were the obvious, and correct, explanation.

  6. When I used to write safety critical software for aircraft, the notional mean time between failures for software was a million operational hours. Sounds great, that’s 114 years of continuous operation but with 11788 post offices open 8 hours/day you would expect an error every 10 days on average, or 500 over the 15 years of Horizon service. I probably would describe the Horizon system as ‘robust’ but robust does not mean error-free and the Post Office seem to have had no concept of this fact and no process in place to identify, track and fix the inevitable faults in its system.

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.