Subscribe Bookmark
charles_pirrell1

Joined:

Jun 13, 2014

Why spreadsheets can lead to error -- Part 1

In 2010, esteemed Harvard professors Carmen Reinhart and Kenneth Rogoff published the paper “Growth in a Time of Debt.” It soon became powerful supporting evidence for those who argued against growing government spending, even in times of needed economic stimulation. The paper looked at the debt levels of many different countries over different time spans and concluded that when their debt level reached 90 percent of their GDP, economic growth slowed dramatically. So influential was this report that Rep. Paul Ryan cited it in his 2013 budget.

Then, in April 2013, three economists from the University of Massachusetts at Amherst discovered a coding error in the Microsoft Excel spreadsheet that Reinhart and Rogoff used as a basis for their conclusions. The Harvard professors were quick to admit their error, yet they stated that it would not materially affect the conclusions drawn from the analysis.

While you can certainly argue over whether their conclusions should stand or fall, it is still surprising that the error went undetected – and that policy was based on the analysis. What was not surprising to those who know and even love Excel was that errors can and do happen – often.

Since April 2013, other famous errors involving Excel have been the subject of a number of articles. A recent CNN Money magazine article documented several such errors. The biggest errors resulted in:

  • JPMorgan Chase’s $6.2 billion trading loss in 2012. The Excel model used to monitor risk involved several manual steps, and one cell added when it should have averaged.
    • MF Global: About a year before going bust, MF Global never made necessary upgrades to their Excel spreadsheets that monitored risk.
      • Barclays purchase of Lehman Brothers: An Excel spreadsheet had 200 hidden cells that should have been deleted and removed from the asset list that Barclays had intended to buy. Barclays bought 179 toxic deals as a result.
        • Utah: Utah's office of education miscalculated the number of students who would enroll in the state's public schools. "A faulty reference in a spreadsheet" resulted in a $25 million budget shortfall.
        • This is the first in a blog post series that will discuss how using Excel can go from being elegantly simple to increasingly more difficult as data is refreshed or added and as changes are made.

          This series will also suggest that there are better tools that are designed to provide more governance of one’s analysis without giving up ease of use and flexibility.

          Let's begin with a little history.

          Spreadsheets – they changed our lives forever

          The history of spreadsheets goes back to batch processors of the 1960s, but we’ll start a little later with the one that first appeared on an Apple®computer in 1979: VisiCalc.

          VisiCalc

          I remember so clearly when a friend showed me his new Apple  computer and inserted a 5.25-inch floppy disk that, moments later, displayed VisiCalc on his green-and-black CRT display. (If you have to look up CRT, you are a mere babe in the computing world.)

          I was amazed at how fast (a snail’s pace in today’s world) he could change a value and recalculate his results. It was like magic!

          According to Wikipedia, VisiCalc went on to become the first killer app, an application that was so amazing that my friend’s company would buy computers just for its use alone.

          VisiCalc had a major role in the success of the Apple II computer, and the program later became available for use on other computers popular at the time.

          Lotus 1-2-3

          In 1983, I was a young systems analyst working for a consumer products company. That’s when Lotus 1-2-3 was released for the IBM PC. I remember seeing it and recommending to my boss that this product would bring us a huge competitive advantage. I said we should purchase three machines with Lotus 1-2-3 loaded on them for our accounting department. My boss was skeptical at first, but I persuaded him. It was one of the best career moves I’ve ever made. Lotus was a big hit and soon beat VisiCalc as the leading spreadsheet product.

          Microsoft Excel

          Microsoft developed Excel on the Macintosh® platform for several years starting in 1985. The company rewrote the program for Windows 2.0, releasing it in 1987, according to Wikipedia. The release of the Windows version vastly increased Excel's market share. Lotus was never able to regain the market share it lost to Excel, and Microsoft has continued to lead the spreadsheet market.

          Spreadsheet users have never looked back. Today Excel is so pervasive and still so easy to use that I have yet to meet anyone who works for any organization in any discipline in the modern world who doesn’t know about it. But, oh, how its use has changed over the years.

          In my next post (next Wednesday), we’ll look at the strengths and weaknesses of Excel. In the meantime, do you have any Excel horror stories you can share?

          12 Comments
          Community Member

          Joseph wrote:

          Is now an opportune time to mention free and Free options, particularly LibreOffice and its cousin Apache OpenOffice, Gnumeric, etc?

          Just because it's expensive doesn't mean that it's necessary to get the job done (or, on topic here, mess up royally ;)

          Community Member

          Emil M Friedman wrote:

          Back in my "between jobs" period a client wanted an analysis showing that various inputs that affected appearance had no substantial effect on certain other properties. He supplied an Excel data file which I promptly imported into JMP (ver 5?) and analyzed.

          Something which I've no longer remember caused me to ask him to check one particular entry. Upon doing so he discovered that someone in his organization had sorted a portion of the Excel file, thus desynchronizing the relationships between the inputs and outputs. Since all of the data were correct but were just in the wrong rows and since the inputs truly did not have a noticeable effect on the output, residual plots and outlier checks had not revealed the gross corruption of the data.

          In this case it wasn't a horror story to me because I was charging by the hour and everyone agreed that the necessary rework was clearly not my fault.

          Important aside: JMP doesn't let one make that sort of mistake because it forces you to sort the whole file, not just an accidently chosen portion of it.

          Community Member

          Emil M Friedman wrote:

          PS, I remember using Visicalc. I was already using mainframe SAS for a lot of things, including things it wasn't intended for, but Visicalc was very convenient for interactively looking at "what if" scenarios.

          Community Member

          Neil wrote:

          It's interesting, but of course the issue on the examples you've quoted from the CNN piece are good old fashioned user errors, not errors that Excel could be considered responsible for.

          If the companies decided to take action, they couldn't sue Microsoft because Excel wasn't doing what it was supposed to. I imagine what you're going to highlight later is that better controls can be put in place by different types of software to check for errors, but some of these errors - like 200 hidden rows, are still going to happen regardless.

          It's possible to minimise the risk of human error with controls for common mistakes, but hardly possible to completely eradicate them.

          Of course, Excel is so flexible, and as you say, has gone far beyond the basic spreadsheet application that it was always meant to be. You can now build controls and tests into it just as you would have built into other applications, either through basic formula checking, nested IF statements or through VBA. That kind of flexibility is missing from a lot of bespoke data software, which prides itself on its closed and controlled nature.

          That nature is both a positive and a negative, and that's why Excel use as a spreadsheet, planner, application, reporting tool, data analysis tool has proliferated beyond other software.

          Anyway - horror stories. Well, there are many other user errors I can think of. When everyone uses it, and when every computer in the office has it, there are plenty of opportunities to misunderstand a function.

          Real scenarios I've encountered:

          1. A user had a list of individuals and their monthly sales commission volumes to be paid. Just to pretty it up, they sorted the names alphabetically before they sent it. Sadly that filter didn't highlight the amounts column - and the company's sales team all got each others commission for the month. Whoops.

          2. One of the errors that occurs when working with data extracts. A list of details was being worked on from an extract made by IT from the company's CRM system. Of course the extract was in Comma Separated Values format. Opened by default in Excel, the user worked on all of the accounts, highlighting all of the accounts which required a certain action to be taken, painstakingly looking them up one-by-one in the IT system to identify the error. They then save and send the document, still in CSV, so that this action can be taken.

          Usual system warning blindness saw the user ignore the "This will remove all formatting" warning, and the next day an email saying "All the problem accounts are highlighted red" was received by the department, with an attached CSV file devoid of course of any formatting. D'oh!

          Community Member

          Mike Clayton wrote:

          GREAT! Thanks for the examples and history. I look forward to Part II.

          I remember the impact of Visicalc, and I still have my Apple Mac SE which has Excel 1.0 on board, as Gates loved (and cloned) the first Mac ideas. His version 1 of Excel runs on 49K. Current version is slightly larger. I also remember when his team hid Flight Simulator in one version of Excel. Also the floating point errors, and statistical add-in errors.

          There are many Excel Templates that purport to give support for Six Sigma, SPC, et al and have to be thoroughly tested before use. They usually cost about $250 to essentially try to improve on Excel's graphical and stats capabilities. But each graph usually requires special data structuring which is unique for each graph generally.

          At some of my clients, a the JMP users support 100 or more management and QA and supply chain folks that use Excel, in the more critical data analysis and reporting efforts. Inevitably, some excel results are sent back as "evidence" of JMP errors, which turns out to be human error on Excel side. Some old excel macros actually run major enterprise operations with bad code in place that is "ignored as known to be wrong" after time, but author has left the company.

          Community Member

          Adam Smith wrote:

          As part of my masterâ s thesis I created a simulation model in Excel. After I had a spent a lot of time creating the initial model, I went to validate my initial output against the expectations. It turned out I was way off the mark! Then I spent most of the day checking every formula, and cell reference, but I was not able to find anything wrong. In frustration I went back to the validation spreadsheet, and discovered that when I had copied and pasted my results to compare to my expectations, I had not copied the values, but instead I had copied formulas. Thus, the numbers were meaningless! When I went back and compared the right values from my model to my expectation, they were spot on. There was nothing wrong with my model, only with my validation step!

          Community Member

          john schmidt wrote:

          I don't really have a horror story. More of a guiding principle that i call ABM.

          Anything But Microsoft.

          Charles Pirrello wrote:

          Hi Emil,

          Thanks for the comment. Your example illustrates the fact that Excel cells are independent entities that are not subject to the kind of strict data governance that JMP provides. I'll get into this in more detail in a subsequent post.

          Charles Pirrello wrote:

          Thanks for you comment Adam. This does bring up an important point. I'm not saying that Excel itself is prone to errors. Only that it is so very flexible that users can unwittingly perform tasks incorrectly. In your example, we can't blame Excel. It did exactly what you asked it to.

          Charles Pirrello wrote:

          Neil,

          Well said. I never implied that Excel itself is prone to errors. Itâ s just that it is so flexible and its use has grown beyond the capacity of its original intent. And you are right, if more users learned the discipline of what it take to create proper data files, many of the errors would cease. However, this brings up an interesting dilemma that I'm sure Microsoft has found itself forced into. Excel became wildly successful because of its flexibility and ease of use. However that ease of use sacrificed structure and control â as I believe was intended. Thereâ s no way Microsoft would have wanted to start imposing the type of structure necessary to force its users to be more disciplined. If they had, I believe the current Excel footprint would be a mere shadow of itself. So it continues to grow in popularity and user base, and continues to be pushed to its limits.

          Charles Pirrello wrote:

          Hi John. Not sure I completely agree. I've used Excel my entire career and am still a fan. I am actually in awe of what it is able to do. But my main concern is when it is misused, especially in instances where it is the basis for important decisions. So I have a great deal of respect for the product, but know when it is out of its element.

          Community Member

          John Tate wrote:

          It's useful to think of a spreadsheet as essentially a programming tool. As such, it has some advantages:

          It's very easy to use, at least for relatively simple problems, and even for some more complex ones it is still not too hard.

          It's static, meaning that these cells over here are your data, those cells over there are your formulas, and they stay put. You don't have to think about dynamic behavior like loops, procedure calls, stack operations, and the like.

          In a certain sense, everything is visible. If you select a cell, you see either its value or its formula, simple as that.

          There are more, but you get the idea.

          They also have disadvantages, by means of which it's quite easy to negate the advantages. For example:

          You can hide information in several ways. The Barclay's case is one example; I have also seen advice to lock certain cells in a spreadsheet intended to be accessible to those using assistive technology, and hide the contents by having the background and font color match.

          Certain operations are dynamic at the moment you click the mouse, such as the sort, leading to failures like the de-arranged sales commissions.

          In short, it's just as easy to make a programming error in a spreadsheet as in any other programming system. So if the results matter, its power, flexibility, and ease of use impose the same need for discipline as in any other programming system.