Warning message for Gaussian Process: Likelihood estimation algorithm did not co...

hanyu119 · Jun 8, 2023 5:54 PM

Dear JMP experts,

I recently conducted a space-filling DOE with Latin Hypercube design of 700 runs. I then use the Gaussian process to fit the data. The 'Actual by Predicted Plot' looks very nice and most of the data lies perfectly along the 45-degree line. However, I got a warning message says:

Warning: Likelihood estimation algorithm did not converge.

Please find attached the Gaussian process Model report.

I've looked around and can't find any documentation/discussion regarding this warning message.

Can you please kindly give me some guidance?

-What causes this message?

-What does this message mean?

-Can I still trust the result? If not, what should I do to correct the model?

Thank you.

Best regards,

Yili

peng_liu · Sep 12, 2022 8:27 PM

How about removing airgap2? After that, how much does -2Loglikelihood change, how much does prediction change?

The model fitting method is maximum likelihood. It is an optimization technique which tries to find the maximum on a surface. Ideally, the estimates are where the surface reaches its top. When you see this warning message, it typically means that the algorithm is not sure it reaches the top. Not necessarily means it does not though. Here, I suspect it reaches the top, but not sure, due to airgap2, which contributes little to the model.

hanyu119 · Sep 13, 2022 07:43 AM

Many thanks for the prompt reply. Can you please elaborate a little bit more about the model fitting methods?

There are 700 treatments in my DOE. Does it mean that, for each treatment, the algorithm tries to find the maximum on a surface?

What is this surface that you refer to?

Thank you.

peng_liu · Sep 13, 2022 6:26 AM

Each treatment in your case is an observation. All your 700 treatments form a data. The GP model builds on the definition of the probability of observing the data given parameters. This probability is a function of unknown parameters. This function is called likelihood function. This function can be visualized as a surface over the coordinate system defined by the parameters. So there is just one surface. The maximum likelihood method will try to identify the highest place on this surface.

Here is a link to the material which I think is accessible and also rigorous: Maximum Likelihood Estimation And you can skip Example 1-1.

The definition of the GP model can be found here: Statistical Details for the Gaussian Process Platform Loosely speaking a GP model assumes the data are from a multivariate normal distribution. With such information, a likelihood function is defined.

hanyu119 · Sep 13, 2022 08:07 PM

Thanks Peng for your very informative answer!

If you don't mind, can you please also explain the difference between the prediction formula and the jackknife predicted values?

Why did the actual by predicted plot plots the actual value against the jackknife predicted value instead of the value predicted by the prediction formula?

In addition, without using the nugget, the predicted value of the prediction formula is exactly the same as the actual simulation data? How does this work?

What does the prediction variance profile (the one in the 'evaluate design' tab in a factorial design DOE) look like for a GP prediction? Is it random?

Is there any way to validate the GP model? I normally will run a few hundred extra points and do a t test and an unequal variance test of the residual between the original data set (predicted value - simulation value) and the validation dataset (predicted value - simulation value). Is this approach still valid for the GP model?

Thank you.

peng_liu · Sep 13, 2022 11:42 PM

A prediction (without nugget fitting, using all data) should equal the corresponding observation. This is the nature of GP model. If you plot the actual vs this type of predicted values, you get a straight line, which does not give you much insight about the performance of the model on new data. Jackknife (Jackknife resampling ) prediction predicts for each observation by leaving that observation out. A Jackknife GP model will fit all remaining observations perfectly except the one left out. This approach gives some insight how the model perform on new data. Therefore, they are different.

Without nugget, we are back to the vanilla GP model. I am not sure what you mean by "how does this work". With nugget, you may want to look into this technique: Ridge Regression As I mentioned that the model assumes a multivariate normal distribution, and the task is to figure out the covariance structure, which sometimes can be challenging without tricks like the ridge technique.

I am not familiar with the applications in DOE platforms. If you can elaborate, e.g. steps to get the specific report that you are interested in, maybe I can further help. You may want to see whether other community members can answer. If you don't see any answers for a while, please consider open a separate thread on that specific question.

I am not sure about validating a GP model. But certainly we can evaluate its performance on new data. One way to do that is to see whether your new observations are within their prediction intervals. To do that, save Prediction and Variance formulas, then append your new data to the bottom of the table. The formulas will evaluate to produce predictions and variances. With observation, its prediction, its prediction variance, the rest is a z-test exercise. And I am not sure whether equal variance is relevant in this context.

hanyu119 · Sep 14, 2022 07:59 PM

Many thanks again for the very informative reply.

I still have some questions in regard to model validation.

I've attached the JMP file in the reply with the original dataset (first 700 points) and validation dataset (last 300 points).

I normally will calculate the difference between the predicted value and the actual simulation results for both dataset and do a t test to see whether the difference of two datasets has the same mean. For this particular dataset, the t test tells me the difference for the two datasets is not statistically different.

However, when I tried your method to test the variance using the predicted variance formula of the two datasets, it tells me the mean of the variance is statistically different.

Which approach is correct to validate the prediction ability of the model?

Thank you.

peng_liu · Sep 14, 2022 11:49 PM

I did not spell out the steps of my approach. Here I use your data to illustrate.

First create a new column

This is the predicted standard error.

Now create another column, which I call it "6sigma". This is an indicator column to see whether the absolute value of the difference is within 6-sigma. Yes means 1, no means 0. For training rows, assign missing. Make sure the column type is categorical.

Now un-exclude the validation rows. And run Distribution on it.

I see roughly 95% Yes. This tells me that the model is doing a good job.

hanyu119 · Sep 15, 2022 09:17 PM

Many thanks for further elaborating your approach. That's great.

I don't quite understand how to interpret the if function in JMP, can you please kindly explain? Is it a "." after the -> in the first row of the equation? What does that mean?

How to assign 'missing' to the training rows?

You say 95% tells you that the model is doing a good job. What is the criteria for such judgement?

Can you please attach the file with model evaluation in the reply?

Thank you.

peng_liu · Sep 15, 2022 10:04 PM

You can find the syntax about the IF function here: if . Or if you are up to the challenge to learn the JMP Scripting Language, you can start here: Scripting Guide .

The If statement that I showed can be read as: if the row number is less than or equal to 700, assign a missing value, otherwise depending on whether the absolute value of difference is less than 3 standard errors, assign 1 or 0.

In another word, the formula calculates whether a prediction error is within 6 standard errors, or call it 6 sigma.

What I said that the model is doing a good job was subjective. But I am going to explain what I meant. If the prediction error has a Normal distribution, the 6 sigma brackets the middle piece of the distribution, which holds over 99% of the probability. We wish to see the prediction errors fall into this bucket with over 99% chance. Then we say that the model is doing its job.

Now for this particular data, the validation set is about 300 observations, and we see about 95% fall into this bucket. Bad news? Not necessarily. My previous statement about 99% chance was assuming there are many many more validation data can be tested, and we wish to see the 99%+ chance. For merely 300 observations, with 95%, it is a subjective judgement. If I see 80%, 70%, 60%, I will increasingly raise doubt.

I attach the file as requested.

Warning message for Gaussian Process: Likelihood estimation algorithm did not converge

Re: Warning message for Gaussian Process: Likelihood estimation algorithm did not converge

Re: Warning message for Gaussian Process: Likelihood estimation algorithm did not converge

Re: Warning message for Gaussian Process: Likelihood estimation algorithm did not converge

Re: Warning message for Gaussian Process: Likelihood estimation algorithm did not converge

Re: Warning message for Gaussian Process: Likelihood estimation algorithm did not converge

Re: Warning message for Gaussian Process: Likelihood estimation algorithm did not converge

Re: Warning message for Gaussian Process: Likelihood estimation algorithm did not converge

Re: Warning message for Gaussian Process: Likelihood estimation algorithm did not converge

Re: Warning message for Gaussian Process: Likelihood estimation algorithm did not converge