Strengths and weaknesses of spreadsheets -- Part 2
Our World Statistics Day conversations have been a great reminder of how much statistics can inform our lives. Do you have an example of how statistics has made a difference in your life? Share your story with the Community!
The power of the spreadsheet has always been its interactive ease of use. It allows users to design a collection of input and output cells that can contain numeric data, text or formulas. And these cells can then be strung together into a cohesive quilt that looks, and has the potential to act, like a data table.
Alas, the similarity to a data table stops there. A spreadsheet is to a data table as a collection of stacked shoeboxes is to a shoe closet. There is no structure. Boxes or cells can be randomly placed, without apparent ramifications. This is both a strength and weakness of Excel. The strength is that it provides maximum flexibility in how users can organize their data. This is also its weakness. Users can build a spreadsheet that may drive important decisions without structuring their data properly.
Still, you will never get me to give up Excel. I love the product. But I also acknowledge and accept its weaknesses. These days, I use it more as a glorified calculator than a data analysis tool. Today, Excel is used beyond its original intent, and it has become bloated with too many functions and too few controls. So what can go wrong exactly and how, you ask? Let me count the ways.
Excel exploits the fact that most of us are not disciplined in the art and science of building a database. It allows us to take shortcuts that seasoned, competent database professionals would never take. And we develop these bad habits because most of us learn how to use Excel for the sole purpose of developing spreadsheets for personal use – spreadsheets that no one other than ourselves will have to use or understand. And this level of comfort and intimacy with our spreadsheets never wanes, even when we use them to generate conclusions and drive decisions outside our personal domain.
Think back to the first time you used Excel. It was so forgiving. It adapted to the way you wanted to organize your data, how you wanted to calculate and display results. Its beauty was its simplicity, backed by a powerful calculation engine. It was an analysis tool and report tool all in one. Unfortunately, as spreadsheets expand and evolve, the shortcuts we take during their creation amplify their risks: formulas with hard-coded references, inserted cells that fail to be included in predefined formulas – and it goes on and on.
Data tables vs. spreadsheets
So let’s take a moment to compare a spreadsheet to a data table. A data table is a mechanism for storing data, text and formulas in a row/column format. The columns of a data table contain its lowest level of attributes. Individual cells in a data table are governed by the attributes of the column to which they belong. They do not have any attributes specific to themselves. That is why one cell in a column cannot be text while the cell above or below it can be numeric. The governance set by the column won’t allow that.
Spreadsheets, on the other hand are a collection of independent cells, each having their own attributes. So a cell containing numeric data can sit on top of a cell containing text or a formula. How cool, right? How flexible. Cool and flexible, yes. Appropriate for disciplined modeling and analysis? No, not so much.
Example: Fun with Sum
Let’s take a look at one way Excel users can unwittingly create errors in their spreadsheets.
Below is a spreadsheet with some calculated sum cells:
We’ll first focus on Scenario 1.
Cells B7 and B14 contain subtotals using the sum command. A grand total is calculated using a simple Cell 1 + Cell 2 command. Then the subtotal and grand total are repeated at the bottom of the sheet. Many times, users will store the totals at the bottom of the data details.
Next, we’ll insert cells into the spreadsheet. Note that it doesn’t matter whether you use the insert rows command or the insert cells command. None of the formulas have be manually changed.
Now we’ll add some values to the inserted cells.
In this first example, values were added to cells that are adjacent to the original cells used in the calculation. You’ll notice that cell B10 assumed that the new values should be included in the subtotal since they reside after the last original value and the subtotal. This may or may not be what the user intends. But let’s assume that is the behavior the user wants. Note that the subtotal in cell B24 did NOT change. Instead, an indicator is placed in the upper-left side of the cell.
When users click on the indicator, they see the following:
Users then need to choose from a list of possible corrections. And this would have to be done for every calculated cell that was affected.
OK, that’s Scenario 1. In Scenario 1b next week, I’ll show how Excel can easily be fooled into thinking that new data should NOT be added to a predefined sum cell. Can any of you guess what could make that happen?
And if that’s not enough, Scenario 2 (also next week) takes a slightly different, although much used, approach to creating formulas. This approach also may have unintended consequences. Extra points if you guess how this happens.