Simulate vs Bootstrap methods in JMP: which to use when and why

SDF1 · Sep 12, 2023 11:21 AM

Hi All,

I'm hoping someone from JMP might be able to provide an explanation as the the differences/similarities of the Simulate vs Bootstrap approaches when evaluating model estimates. (JMP Pro)

Some links to previous Discovery Summit presentations from Gotwalt and others can be found here, here, and here. One clear difference between the methods is that Simulate requires you to swap out a column formula with another column formula, whereas bootstrapping does not. I understand the concept of bootstrapping when recalculating the parameter estimates, but it's unclear to me why the fractional random weighted column needs to be swapped out when simulating the same data. See for example the first link above and corresponding presentation that explains this approach using an autovalidation setup (in this case they were working with DOE results).

What I'd like to better understand is the benefits/drawbacks of either approach and when is one approach better than another. I have a wide and tall data table where I have several process variables and product quality measurements. I want to narrow down the set of possible process variables that might be influencing the outcome to provide production some idea of where to investigate. To do this, I take the Null Factor idea from their presentations and include that as a truly random variable in the model. I then run a very basic GenReg model on the factors and Y output, and then right click the column of estimates and run either the bootstrap or simulate option on the parameter estimates.

I do not get the same results running both ways, and I'd like to understand why. Also, is one approach more appropriate over another since this isn't a DOE?

Examples of the results are below on a Pareto Plot, where any factors that are introduced into the model fewer times than the Null Factor are simply removed. The first image shows the results of using the Simulate method, where the factor on the far right of the Pareto plot is the Null Factor. Similarly, the second image shows the results using the Bootstrap method, where again the Null Factor is at the far right.

Why do I obtain two very different results for the two methods? Is one more "trustworthy" than the other? Why would I use one method over the other? Is one method more robust than the other? If you do get different results, why does JMP Pro have the two options?

Any insights to this are much appreciated.

Thanks!,

DS

peng_liu · Sep 14, 2023 09:44 PM

If you would like to know general knowledge about Bootstrap and Simulation in JMP, the link (2) and (3) are more relevant. Link (1) is a more advance topic, and specific to DOE.

William Meeker is one of the speakers of (2) and (3). You may want to check out Chapter 9 of his book "Statistical Methods for Reliability Data", 2nd. ed. If you only have access to its first edition, it is also Chapter 9 that you should check out. The very detailed descriptions are documented there. Many of us learned the techniques there. But I am sure there are other references. Meeker's materials cover both the general concepts of the techniques, and special applications of reliability analysis. If the reliability analysis part is irrelevant, you may safely skip.

If I would summarize what are in the book, together with what Mark explained, the following is my stab.

The steps of "Bootstrap", AKA "nonparametric bootstrap", include:

sample from your data
do something
calculate the quantity of your interest, collect quantity
repeat.

The steps of "Simulate", AKA "parametric bootstrap", include:

starts with a model with some known parameters, simulate a sample from it
do something
calculate the quantity of your interest, collect the quantity
repeat.

As you have mentioned, the purpose is to get "confidence interval". Which to choose? That depends on your task. I cannot speak to the task that you have described. But look at link (2) and (3), link (2) and majority part of (3) are related to applications of the "nonparametric bootstrap" (or Bootstrap), because the task is to better quantify confidence interval of estimates, for which large sample approximation is not the most effect approach in the context. Now look at when "parametric bootstrap" (or Simulate) is used in link (3), that is for power calculation. What is power calculation? For that kind of calculation, one must know the truth, so when a decision is made by an method, one must know whether the decision on a hypothesis test is correct or not. Then one can sweep the conclusion into the correct quadrant in the table of Type I and type II errors . How can one be sure about the truth? Just start with the truth. That is when "parametric bootstrap" is more appropriate.

The bootstrap methods are versatile technologies, I am sure what I described is a tip of the iceberg. But what Mark said summarizes the difference: they have different starting points.

For your specific task, I have one question, when you did the Simulate, how did you come up with the formula column? Is it the truth that you know?

Re: Simulate vs Bootstrap methods in JMP: which to use when and why