Creating a covering array when you can't test some factors together
Our World Statistics Day conversations have been a great reminder of how much statistics can inform our lives. Do you have an example of how statistics has made a difference in your life? Share your story with the Community!
To test a particular platform in JMP Genomics, there are 14 different factors that can be varied, ranging from 14 levels to two levels. What makes this case different from the testing software preferences example I discussed previously is that for three of the factors (Interactive Hierarchical Clustering Options, Automated Hierarchical Clustering Options, Minimum Recombination Grouping Options), only one can be set for any given test run. This restriction arises because of the behavior of the following radio control:
In this example, we have a factor for Linkage Grouping Method that has three possible levels. Each of the grouping methods has one of the above mentioned factors associated with it. Essentially, these three associated factors break into three separate cases to consider. If we wanted a strength 2 covering array, couldn’t we just create a strength 2 covering array for each possibility? This would ensure that each possible (allowable) pair occurs in our testing.
We could, but…
The two factors with the largest number of levels have 14 and 9 levels, respectively. If we didn’t have any restrictions, the smallest possible strength 2 covering array has 126 = 14*9 runs (and the covering array platform can find such a design). However, if we create 3 separate covering arrays, that means each will be 126 runs, for a total of 378 = 126*3 runs. A strength 2 covering array ensures each possible pair occurs at least once, but by breaking it up into three covering arrays, we end up having more coverage than we need. By combining the three covering arrays, each of the pairs not involved in the restricted factors is actually occurring at least three times.
What would be nice is a design that has missing values for factors that cannot occur based on the other settings.
Really? Missing values?
When you think of it from a traditional DOE standpoint, creating a design with missing values sounds silly. But for covering arrays, where we’re looking at combinations of factors, it makes perfect sense: If a factor has a missing value in a row, it means it’s not relevant for that particular test. This also means that if we see a failure for that test, we know the missing factor is not involved in the cause. Fortunately, our Analysis tool recognizes missing values as well.
OK, so how can I create this design?
Consider one of the three grouping methods, Automated Hierarchical Clustering. When it appears, another factor, Automated Hierarchical Clustering Options, can take on three different levels, while the factors Interactive Hierarchical Clustering Options and Minimum Recombination Grouping Options should be missing. We can use the handy disallowed combinations filter, and when Linkage Grouping Method is Automated Hierarchical Clustering, disallow all values for Interactive Hierarchical Clustering Options, join with an OR and do the same thing with Automated Hierarchical Clustering and Minimum Recombination Grouping Options.
We could then follow a similar procedure for the other two grouping methods linking these with OR statements from the Data Filter. So we should now be ready to create the design…
Not quite yet
We’ve overlooked one thing in our disallowed combinations that is very easy to overlook – the designer will still try to make pairs of those restricted factors show up in rows with Grouping Method missing, which doesn’t make any sense for our design. So, we have to disallow all possible combinations between those columns from occurring. For example, if we choose Interactive Hierarchical Clustering Options and Automated Hierarchical Clustering Options from the filter, we would get (with the earlier disallowed combinations cropped from the top):
After we’ve done those combinations and the previously mentioned disallowed combinations, all connected by using OR in the filter, we can create the design.
The Covering Array platform finds us a 140-run design (more on this in a bit, but this is, in fact, the smallest possible run size), with only one of Interactive Hierarchical Clustering Options, Automated Hierarchical Clustering Options, Minimum Recombination Grouping Options set for each row and the other two missing. I’ve put the data table on the File Exchange in the JMP User Community, where you can see the final result and take a peek at what the resulting disallowed combinations looked like.
While some extra work was needed to set up the disallowed combinations, the savings in the number of tests (378 vs. 140) was dramatic. With the help of the Disallowed Combinations Filter, once it was determined what should be disallowed, it was easy to input them, even with a number of different combinations.
The keen reader may have noticed I could collapse all the possible values from the three variables into one factor with 10 levels (this is how I knew that 140 = 14*10 runs was the lower bound). While it’s certainly possible to construct the design in such a way, it’s easy to lose the context as to what that variable is trying to describe.
Do you run across similar types of problems when using covering arrays? If there’s interest, we can see about adding the capability to easily generate the specialized disallowed combination when some factors cannot be tested together. Thanks for reading!