Why software testing is an increasingly important aspect of modern software deve...

JosephMorgan · Oct 10, 2024 10:00 AM

In 2002, the National Institute of Standards and Technology (NIST) released a report detailing the economic impact of inadequate software testing (NIST Report, 2002). The report concluded that:

“… the national annual costs of an inadequate infrastructure for software testing is estimated to range from $22.2 billion to $59.5 billion.”

The NIST report also pointed out that the state of software testing practice was woefully lacking.

Seven years later, Wong et al., in an IEEE Reliability Society report (Wong et al., 2009), reviewed 15 widely publicized software failures, several of which led to loss of life, and concluded that:

“…better testing procedures and practices need to be implemented, and this is especially true of software that is related to safety-critical systems.”

In a more recent article, Wong et al., examined a range of software failures across several industries (Wong et al., 2017) and made a rather sobering point:

“…the disappointing truth is that software is far from defect-free and large sums of money are spent each year to fix or maintain (defective) software.”

The article analyzed software failures over a 30-year period, from 1985 to 2015; identified the nature of the software bug(s) that precipitated the failures, along with other contributing factors, such as user error; and classified the failures into two categories: those that were fatal (resulted in loss of life) and those that were not fatal but resulted in substantial economic loss. The fatal failures included the 1985 Therac-25 incident, where at least six patients were given overdoses of radiation, as well as civil aviation accidents between 1995 and 1997, where more than 650 people lost their lives. It also included the 2014 Ebola incident where an electronic health records software failure resulted in the misdiagnosis of a patient infected with Ebola, resulting in a subsequent death. The nonfatal failures included a nuclear power plant software failure, which resulted in a $2 million loss, as well as the Ariane 5 software failure, which resulted in a $7 billion loss. In their conclusion, the authors offered the following observation:

“… the ubiquity of embedded software in public infrastructure, transportation systems, consumer products, and more leaves [us] incredulous that someone who has worked on complex software systems could believe that it is possible to build such systems without faults…”

Sadly, despite these sobering accounts of the types of software systems that we are increasingly reliant upon, there is a key point missing from these discussions. That is, most software failures go undetected and unreported. In some instances, the failure is subtle – or even intermittent – and so is easily missed. For example, a statistical software application may present incorrectly computed statistics. Such incorrectly computed statistics are failures but may not be immediately apparent, especially to a nonexpert. Nevertheless, such failures may lead to incorrect, or suboptimal, decisions and could therefore be very consequential.

It is evident that the craft of writing software – regardless of the sophistication of the integrated development environments used, the sophistication of the programming methods, or the skill of the programmers – is an imperfect craft and will result in bugs. This reality means that it is up to the rigor of the software testing effort to discover these bugs before the software is released. As NIST and Wong et. al., indicate, failure to do so can lead to significant consequences. Barry Boehm, an iconic figure in the software engineering community, pointed out in his seminal paper on software engineering economics (Boehm, 1984), that the cost to find and fix a bug after software deployment is usually at least 100 times more than identifying and fixing the bug prior to deployment.

Given this backdrop, is it the case then that software testing methods are simply inadequate? I would argue that this is not the case. In fact, software testing research has been a robust area of inquiry for over 50 years. In 2014, Orso and Rothermel wrote about the substantial progress in developing software testing methods in the 15 years prior to their paper (Orso and Rothermel, 2014). The 15-year period prior to 2000 was also a period of dramatic development in software testing methods. Glenford Myers (Myers, 1978) and Boris Beizer (Beizer, 1984), two iconic figures in the software testing community, did their pioneering work in the ’70s and ’80s. Sadly, the primary issue is one of adoption, not adequacy of testing methods. The software development community has been slow to adopt rigorous software testing methods over the years, despite the available empirical evidence of their utility (Kuhn et. al., 2015, Morgan, 2018). This situation has been exacerbated by two software development realities. Software testing efforts are notoriously underfunded and underresourced. In addition, the velocity and complexity of software development efforts have grown dramatically in recent years

Software systems are now more pervasive than ever and play an increasingly important role in our lives. The critical importance of robust software testing methods is evident and is an aspect of software development that all software development practitioners need to be more aware of. To address this need, we have undertaken this series of blog posts. This post is the second of that series, and there are more to come.

Works cited

Beizer, Boris. Software system testing and quality assurance. Van Nostrand Reinhold Co., 1984.

Boehm, Barry W. “Software engineering economics.” IEEE transactions on Software Engineering (1984): 4-21.

Kuhn, D. R., Bryce, R., Duan, F., Ghandehari, L. S., Lei, Y., & Kacker, R. N. (2015). “Combinatorial testing: Theory and practice.” Advances in computers, 99, 1-66.

Myers, Glenford J. “A controlled experiment in program testing and code walkthroughs/inspections.” Communications of the ACM 21.9 (1978): 760-768.

Morgan, J. (2018). “Combinatorial testing: an approach to systems and software testing based on covering arrays.” Analytic methods in systems and software testing, 131-158.

NIST Report. "The economic impacts of inadequate infrastructure for software testing.” National Institute of Standards and Technology Planning Report 02-3. May.2002.

Orso, A., and Rothermel, G. (2014) “Software testing: A research travelogue (2000–2014),” in Proceedings on the Future of Software Engineering, ACM, May 2014, pp. 117–132

Wong, W. Eric, Vidroha Debroy, and Andrew Restrepo. “The role of software in recent catastrophic accidents.” IEEE reliability society 2009 annual technology report 59.3 (2009).

Wong, W. Eric, Xuelin Li, and Philip A. Laplante. “Be more familiar with our enemies and pave the way forward: A review of the roles bugs played in software failures.” Journal of Systems and Software 133 (2017): 68-94.

hogi · ‎10-10-2024

If a software provides thousands and thousands of different functions, it gets difficult to guarantee that all of them work together - and that 2 functions harmonize with a third one.

This can be seen in Tiny Traps in Jmp and JSL .
Sometimes, the combination of 2 or 3 cool features leads to an unexpected result.

It's hard to test for all these tiny interactions - but the swarm intelligence of the user has a chance ; )

Many thanks to the JMP Support team for their efforts to organize and channel the user feedback!

Thanks to the developers for their eagerness to give us software that works like a charm!

gail_massari · ‎10-10-2024

Thank you @JosephMorgan And thanks for making sure that JMP spends so many resources on testing. We also make it easy to report bugs online, by phone, or by email.

JosephMorgan · ‎10-10-2024

Combinatorial testing (see below for references from the post) is an approach to testing that focuses on faults that are due to interactions. The underlying mathematical object is a covering array and so it is a highly efficient approach, even when there are thousands of inputs! This will be a topic of a future blog.

Kuhn, D. R., Bryce, R., Duan, F., Ghandehari, L. S., Lei, Y., & Kacker, R. N. (2015). “Combinatorial testing: Theory and practice.” Advances in computers, 99, 1-66.

Morgan, J. (2018). “Combinatorial testing: an approach to systems and software testing based on covering arrays.” Analytic methods in systems and software testing, 131-158.

Vins · ‎10-11-2024

Great Blog @JosephMorgan, looking forward to the next one.

When using covering arrays and you are limited by the number of test sets you can do, can the results/analysis of your first test say a strength 2 (with some strength 3 covered) allow you to plan for a second covering array with some factors dropped in the first so you can capture a strength 4 or above? Are there any methods in development that would allow a analysis and sequential testing workflow with covering arrays?

Thanks!

JosephMorgan · ‎10-14-2024

@Vins, there are methods in development to do this. So, the answer is "Yes andYes". Just a few weeks ago, at MEMOCODE-2024 (https://memocode2024.github.io/program.html), our colleague presented a talk on this. The idea is to use Bayesian analysis to guide selection of subsequent test cases (i.e., a sequential test case design method). The title of the paper is:

"MaLT: Machine-Learning-Guided Test Case Design and Fault Localization of Complex Software Systems."

The proceedings of the MEMOCODE-2024 conference will be out soon but you may be able to find the paper if you do a Google Scholar search on the title. I will likely say a little bit about this in my combinatorial testing blog.

Vins · ‎10-15-2024

many thanks @JosephMorgan, looking forward to the next blog.

Vins

WebDesignesCrow · ‎10-17-2024

I do agree with @JosephMorgan & @hogi on the importance of software testing, the massive area of possibilities to do the testing.

In fact, every organization or users may have their unique operational structure to test the software.

So I think alternative to re-use older software version (which the users already used to) should be allowed when found new technical issue in new JMP version since this fixes cannot be done overnight.