How to simulate process data with some degree of autocorrelation?

gwenhallberg · Jun 10, 2023 4:54 PM

I work in a manufacturing environment and often see data that is relatively normally distributed, but is not necessarily randomly distributed. The Overall Sigma (standard deviation) is often larger than the Within Sigma (control chart sigma) with regard to process capability calculations. This can occur for a variety of reasons like raw material lot switches and periodic instrument calibrations. I am trying to evaluate some proposed control strategies, but I can't figure out how to accurately simulate my process. If I just use Random Normal(), I don't get any of the autocorrelation I would usually see with the actual process data. Any ideas how to simulate data where the stability index is not 1? This is not a traditional time series problem where there is a predictable period to the data pattern. Thank you all in advance for your assistance.

peng_liu · Sep 30, 2022 05:16 PM

@Mark_Bailey pointed me to this post. And I looked at your data. It looks like your data can be adequately modeled by a so called AR(1) model, short for Autoregressive of Order One. It is a type of time series model which describe a process with auto-correlation.

For AR(1) model, simulating it by using JMP formula column can be done by the following steps:

First, fit the AR(1) model and here is the report:

I include the script in your original data and attach the updated data table.

Next, create a formula column like the following. You need three numbers from the previous report: two parameter estimates, and standard deviation in Model Summary. And see how they are used in the formula.

Now a little about this formula. First imagine how a JMP table evaluate the formula, it is row by row. So the formula says, if it is the first row, just simulate some number according to the sample, here looks like a random Normal(0,1) fits the bill. Then as the row number increases, the formula calculates new values recursively based upon the value from the previous row. This is what that subscript of Y does.

I attach the finished product as well. Your original data table has three scripts. The simulated data table has three scripts. And you can compare their results to see how close the simulated data is to your original data.

One more word, if your data is more complicated, and AR(1) no longer suffices, you may need to go after more complicated time series models. By then, simulation is no longer an easy task. Fortunately, the Time Series platform has built-in simulation capabilities. Run the Time Series script in the attached tables, either yours, or the simulated. In the report, within Model Comparison, by the side of the model, click the red triangle, and select Generate Simulation, which will lead you to a dialog to simulate time series. The dialog should be mostly self-explained, so I am not going to elaborate.

Re: How to simulate process data with some degree of autocorrelation?