topic Re: How to simulate process data with some degree of autocorrelation? in Discussions

How to simulate process data with some degree of autocorrelation?

gwenhallberg — Sat, 10 Jun 2023 23:54:49 GMT

I work in a manufacturing environment and often see data that is relatively normally distributed, but is not necessarily randomly distributed. The Overall Sigma (standard deviation) is often larger than the Within Sigma (control chart sigma) with regard to process capability calculations. This can occur for a variety of reasons like raw material lot switches and periodic instrument calibrations. I am trying to evaluate some proposed control strategies, but I can't figure out how to accurately simulate my process. If I just use Random Normal(), I don't get any of the autocorrelation I would usually see with the actual process data. Any ideas how to simulate data where the stability index is not 1? This is not a traditional time series problem where there is a predictable period to the data pattern. Thank you all in advance for your assistance.

Re: How to simulate process data with some degree of autocorrelation?

Mark_Bailey — Thu, 29 Sep 2022 18:23:34 GMT

There is a function for this purpose:

Re: How to simulate process data with some degree of autocorrelation?

ih — Mon, 03 Oct 2022 11:57:40 GMT

Edit: I also misread and gave a suggestion for multi-correlation instead of autocorrelation. Woops!

I haven't tried @Mark_Bailey's suggestion, I think that might be an easier way to do the same thing:

You might try using principal components, create random value for latent variables, then calculate your simulated variables based on those latent variables and add univariate error.

Using this method you can even make data that is similar to your own process by finding your own loading, coefficients, variances, and errors, and then simulating data just like it.

Here is an example:

New Table( "Example Correlated Data",
	Add Rows( 100 ),
	New Column( "Prin 1", Numeric, "Continuous", Format( "Best", 12 ),
		Formula( Random Normal( 0, 1 ) )
	),
	New Column( "Prin 2", Numeric, "Continuous", Format( "Best", 12 ),
		Formula( Random Normal Mixture( [-1, 2], [0.3, 0.6], [0.25, 0.75] ) )
	),
	New Column( "X1", Numeric, "Continuous", Format( "Best", 12 ),
		Formula( :Prin 1 * 20 + :Prin 2 * 1 + 5 + Random Normal( 0, 3 ) )
	),
	New Column( "X2", Numeric, "Continuous", Format( "Best", 12 ),
		Formula( :Prin 1 * 2 + :Prin 2 * 1 + 3 + Random Normal( 0, 2 ) )
	),
	New Column( "X3", Numeric, "Continuous", Format( "Best", 12 ),
		Formula( :Prin 1 * -1 + :Prin 2 * 0.3 + 200 + Random Normal( 0, 0.2 ) )
	)
)

Re: How to simulate process data with some degree of autocorrelation?

gwenhallberg — Fri, 30 Sep 2022 14:56:04 GMT

Thanks very much, @ih, for the suggestion! I'll give both this and Mark's idea a try.

Re: How to simulate process data with some degree of autocorrelation?

gwenhallberg — Fri, 30 Sep 2022 15:28:31 GMT

Thanks, @Mark_Bailey for the suggestion. This seems like it should be exactly what I need, but I'm afraid I'm having trouble figuring out how to properly implement it. To start with, I would just like to generate one simulated column in a data table. I know the mean, overall process standard deviation, within sigma (control chart sigma - which in this situation is based on the moving range), and the "Autocorrelation" value (from the summary statistics in the distribution platform). I can also get a Correlation value for the process parameter against the Lag() of itself (by one row) from Fit Y by X - and I could repeat the correlation analysis for different row lags. Any chance you could help me understand how I should structure the formula given the information I have (or let me know if I am just missing the boat entirely)?

Re: How to simulate process data with some degree of autocorrelation?

ih — Fri, 30 Sep 2022 16:41:34 GMT

After some investigation, some notes:

There is some noise added to the values, it seems to have a standard deviation near 1
You need to define a mean vector and symmetric covariance matrix with the same dimensions as a mean vector.

Some examples:

Names Default To Here( 1 );

//Make a dataset from a mean vector, covariance matrix, and number of rows
Make Correlated Table = Function({mv, cm, nr, name = "Data Table"}, {dt},
	randmvnMat = Random Multivariate Normal( mv, cm, nr );
	dt = New Table( name );
	dt << Set Matrix(randmvnMat);
	dt << Scatterplot Matrix( Y( dt << Get Column References ), Matrix Format( "Lower Triangular" ), Density Ellipses( 1 ) );
	dt
);

dt1 = Make Correlated Table(
	mv = [0 1 1 0],
	cv = [
		1 0 0 0, 
		0 1 0 0,
		0 0 1 0,
		0 0 0 1
	],
	nr = 100,
	name = "No Correlation"
);

dt2 = Make Correlated Table(
	mv = [0 10 20],
	cv = [
		1 1 0, 
		1 1 0, 
		0 0 1
	],
	nr = 20,
	name = "1-2 Correlated"
);

dt3 = Make Correlated Table(
	mv = [0 10 20],
	cv = [
		1 1 1, 
		1 1 1, 
		1 1 1
	],
	nr = 20,
	name = "1-2-3 Correlated"
);

dt4 = Make Correlated Table(
	mv = [0 10 20],
	cv = [
		 1.0  0.8 -0.5, 
		 0.8  1.0  0.0, 
		-0.5  0.0  1.0
	],
	nr = 20,
	name = "Some Correlation"
);

Re: How to simulate process data with some degree of autocorrelation?

Mark_Bailey — Fri, 30 Sep 2022 18:58:55 GMT

I apologize. I was wrong. You do not have multiple correlation (i.e., more than one variable). You have a time series. Let me look into it more.

Re: How to simulate process data with some degree of autocorrelation?

gwenhallberg — Fri, 30 Sep 2022 19:20:03 GMT

Thanks, @ih - Your nifty script made it so I could understand how the Random Multivariate Normal() feature was intended to work. This will be great to simulate multiple correlated factors - and is not something I was aware of before.

Re: How to simulate process data with some degree of autocorrelation?

gwenhallberg — Fri, 30 Sep 2022 19:29:29 GMT

Thanks, Mark. If it helps, at all - here is an example of the type of data I am trying to simulate.

Re: How to simulate process data with some degree of autocorrelation?

peng_liu — Fri, 30 Sep 2022 21:16:23 GMT

@Mark_Bailey pointed me to this post. And I looked at your data. It looks like your data can be adequately modeled by a so called AR(1) model, short for Autoregressive of Order One. It is a type of time series model which describe a process with auto-correlation.

For AR(1) model, simulating it by using JMP formula column can be done by the following steps:

First, fit the AR(1) model and here is the report:

I include the script in your original data and attach the updated data table.

Next, create a formula column like the following. You need three numbers from the previous report: two parameter estimates, and standard deviation in Model Summary. And see how they are used in the formula.

Now a little about this formula. First imagine how a JMP table evaluate the formula, it is row by row. So the formula says, if it is the first row, just simulate some number according to the sample, here looks like a random Normal(0,1) fits the bill. Then as the row number increases, the formula calculates new values recursively based upon the value from the previous row. This is what that subscript of Y does.

I attach the finished product as well. Your original data table has three scripts. The simulated data table has three scripts. And you can compare their results to see how close the simulated data is to your original data.

One more word, if your data is more complicated, and AR(1) no longer suffices, you may need to go after more complicated time series models. By then, simulation is no longer an easy task. Fortunately, the Time Series platform has built-in simulation capabilities. Run the Time Series script in the attached tables, either yours, or the simulated. In the report, within Model Comparison, by the side of the model, click the red triangle, and select Generate Simulation, which will lead you to a dialog to simulate time series. The dialog should be mostly self-explained, so I am not going to elaborate.

Re: How to simulate process data with some degree of autocorrelation?

ian_jmp — Mon, 03 Oct 2022 11:50:29 GMT

If you don't mind messing with JSL, then you can play the simulation game indefinitely. Please find attached a couple of old scripts that might give you some ideas. The first builds on the reply from @peng_liu and allows you to simulate output from an AR(2) process and see how the estimated parameters match up with the model parameters used. Stating the obvious perhaps, but not every process can be usefully modelled as autoregressive no matter how many parameters are used. I always found the book by Box and Luceno very useful in the industrial context. The second script simulates what they call a 'sticky innovation process'.