Moving beyond one-factor-at-a-time (OFAT) testing

Ryan_Lekivetz · Dec 12, 2024 09:00 AM

In previous blog posts, we talked about breaking down one aspect of the testing challenge, dealing with multiple inputs as a designed experiment where we can start by focusing on one input at a time. A later blog post revisited the fundamental principles of factorial effects in designed experiments. Now, we’ll combine these ideas and discuss what we might want in designing an experiment to be used for testing.

Let us return to the Beta Distribution and consider now if we want to test varying the inputs of the Beta Distribution, focusing on q, alpha, and beta.

Assume that we pick three values for each of those inputs (a low/medium/high). One could imagine trying all 27 (3x3x3) possible combinations. But would this work if we had 10 inputs? 20? 100? At some point, it becomes apparent that we need a different strategy. While we have a way to handle this in typical design of experiment problems, does this same approach work for software testing? Recalling that we want to find bugs and that they lurk in the corners and boundaries, let us reframe this to say that our goal is to find combinations of inputs that will induce a failure (i.e., we don’t observe our expected result).

What about the boundaries?

You may have noticed I seem to be ignoring the boundaries. This is where the OFAT approach can have value. Ideally, we have already checked our boundaries for each of our inputs (and have fixed any issues) before coming to this point in testing. Nevertheless, as we will discover later, we can take boundaries into account as we vary our inputs.

Moving beyond OFAT

In a factorial design, the fundamental principles of factorial effects are in relation to important/significant effects. However, for software testing, we reframe the fundamental principles in terms of “failure-inducing combinations”:

Combination hierarchy: i) Combinations involving fewer inputs are more likely to be failure-inducing than those involving more inputs. ii) Combinations of the same order are equally likely to be important.

Combination sparsity: The number of failure-inducing combinations will be small.

Combination heredity: A combination is more likely failure-inducing if at least one of the parent factors involving the interaction is known to be more likely involved in inducing failures.

Why do we even need to make a distinction?

One of the key differences in testing is that our response is simply a pass/fail – do we deviate from our expected result or not? In addition, we typically assume a deterministic result. That is, if we use the same inputs, we get the same results (no error). While a tester wants to find bugs, eventually we hope those bugs get fixed. We’re not looking to fit a model, find effects, or optimize a response. In fact, the closest analog to the idea of modeling is the task of tracking down a combination that induced a failure. This is known as the fault localization problem, and it has its own set of unique challenges.

So, do these principles even make sense for testing?

It turns out that there is empirical evidence that combination hierarchy holds (see Kuhn et al., 2004). On the other hand, combination sparsity may not always be a reasonable assumption. If a system has not undergone any kind of testing for interactions, or even testing of individual inputs, there may be many failures in a test suite (especially before bugs start getting fixed). This is also related to the combination heredity principle – if an individual input induces a failure, then any combination involving that input will induce a failure.

So, if these assumptions are reasonable, what makes for a good set of tests? Stay tuned to our next blog entries.

Kuhn, D.R., Wallace, D.R. and Gallo, A.M., 2004. Software fault interactions and implications for software testing. IEEE transactions on software engineering, 30(6), pp.418-421.