In 2002, the National Institute of Standards and Technology (NIST) released a report detailing the economic impact of inadequate software testing (NIST Report, 2002). The report concluded that:
“… the national annual costs of an inadequate infrastructure for software testing is estimated to range from $22.2 billion to $59.5 billion.”
The NIST report also pointed out that the state of software testing practice was woefully lacking.
Seven years later, Wong et al., in an IEEE Reliability Society report (Wong et al., 2009), reviewed 15 widely publicized software failures, several of which led to loss of life, and concluded that:
“…better testing procedures and practices need to be implemented, and this is especially true of software that is related to safety-critical systems.”
In a more recent article, Wong et al., examined a range of software failures across several industries (Wong et al., 2017) and made a rather sobering point:
“…the disappointing truth is that software is far from defect-free and large sums of money are spent each year to fix or maintain (defective) software.”
The article analyzed software failures over a 30-year period, from 1985 to 2015; identified the nature of the software bug(s) that precipitated the failures, along with other contributing factors, such as user error; and classified the failures into two categories: those that were fatal (resulted in loss of life) and those that were not fatal but resulted in substantial economic loss. The fatal failures included the 1985 Therac-25 incident, where at least six patients were given overdoses of radiation, as well as civil aviation accidents between 1995 and 1997, where more than 650 people lost their lives. It also included the 2014 Ebola incident where an electronic health records software failure resulted in the misdiagnosis of a patient infected with Ebola, resulting in a subsequent death. The nonfatal failures included a nuclear power plant software failure, which resulted in a $2 million loss, as well as the Ariane 5 software failure, which resulted in a $7 billion loss. In their conclusion, the authors offered the following observation:
“… the ubiquity of embedded software in public infrastructure, transportation systems, consumer products, and more leaves [us] incredulous that someone who has worked on complex software systems could believe that it is possible to build such systems without faults…”
Sadly, despite these sobering accounts of the types of software systems that we are increasingly reliant upon, there is a key point missing from these discussions. That is, most software failures go undetected and unreported. In some instances, the failure is subtle – or even intermittent – and so is easily missed. For example, a statistical software application may present incorrectly computed statistics. Such incorrectly computed statistics are failures but may not be immediately apparent, especially to a nonexpert. Nevertheless, such failures may lead to incorrect, or suboptimal, decisions and could therefore be very consequential.
It is evident that the craft of writing software – regardless of the sophistication of the integrated development environments used, the sophistication of the programming methods, or the skill of the programmers – is an imperfect craft and will result in bugs. This reality means that it is up to the rigor of the software testing effort to discover these bugs before the software is released. As NIST and Wong et. al., indicate, failure to do so can lead to significant consequences. Barry Boehm, an iconic figure in the software engineering community, pointed out in his seminal paper on software engineering economics (Boehm, 1984), that the cost to find and fix a bug after software deployment is usually at least 100 times more than identifying and fixing the bug prior to deployment.
Given this backdrop, is it the case then that software testing methods are simply inadequate? I would argue that this is not the case. In fact, software testing research has been a robust area of inquiry for over 50 years. In 2014, Orso and Rothermel wrote about the substantial progress in developing software testing methods in the 15 years prior to their paper (Orso and Rothermel, 2014). The 15-year period prior to 2000 was also a period of dramatic development in software testing methods. Glenford Myers (Myers, 1978) and Boris Beizer (Beizer, 1984), two iconic figures in the software testing community, did their pioneering work in the ’70s and ’80s. Sadly, the primary issue is one of adoption, not adequacy of testing methods. The software development community has been slow to adopt rigorous software testing methods over the years, despite the available empirical evidence of their utility (Kuhn et. al., 2015, Morgan, 2018). This situation has been exacerbated by two software development realities. Software testing efforts are notoriously underfunded and underresourced. In addition, the velocity and complexity of software development efforts have grown dramatically in recent years
Software systems are now more pervasive than ever and play an increasingly important role in our lives. The critical importance of robust software testing methods is evident and is an aspect of software development that all software development practitioners need to be more aware of. To address this need, we have undertaken this series of blog posts. This post is the second of that series, and there are more to come.
Works cited
Beizer, Boris. Software system testing and quality assurance. Van Nostrand Reinhold Co., 1984.
Boehm, Barry W. “Software engineering economics.” IEEE transactions on Software Engineering (1984): 4-21.
Kuhn, D. R., Bryce, R., Duan, F., Ghandehari, L. S., Lei, Y., & Kacker, R. N. (2015). “Combinatorial testing: Theory and practice.” Advances in computers, 99, 1-66.
Myers, Glenford J. “A controlled experiment in program testing and code walkthroughs/inspections.” Communications of the ACM 21.9 (1978): 760-768.
Morgan, J. (2018). “Combinatorial testing: an approach to systems and software testing based on covering arrays.” Analytic methods in systems and software testing, 131-158.
NIST Report. "The economic impacts of inadequate infrastructure for software testing.” National Institute of Standards and Technology Planning Report 02-3. May.2002.
Orso, A., and Rothermel, G. (2014) “Software testing: A research travelogue (2000–2014),” in Proceedings on the Future of Software Engineering, ACM, May 2014, pp. 117–132
Wong, W. Eric, Vidroha Debroy, and Andrew Restrepo. “The role of software in recent catastrophic accidents.” IEEE reliability society 2009 annual technology report 59.3 (2009).
Wong, W. Eric, Xuelin Li, and Philip A. Laplante. “Be more familiar with our enemies and pave the way forward: A review of the roles bugs played in software failures.” Journal of Systems and Software 133 (2017): 68-94.