## x y equation determination for prediction purpose

Community Trekker

Joined:

Feb 17, 2017

Dear all,

I need a help about to determine x-y relationship determination. I have 6 batches and I analyzed them at different days. (22 times). I try to graph x= timepoint y= measurement graph and then create the most suitable equation between them. Linear, qubic or etc... How can ı determine to most suitable equation to determine when timepoint x= 6 y=? value?

6 REPLIES

Staff

Joined:

Jun 23, 2011

JMP provides many platforms for fitting different kinds of models. Each provides a variety of statistics or criteria to help select the best model from the data. (This process assumes that you have not previously determined the model from a theory.) Linear models are a great place to start.

• Select Analyze > Fit Model.
• Select your dependent y data column and click Y.
• Now you have a choice:
• you can add terms until you decide that there is no bias or
• enter all possible terms and remove the terms you decide are not significant until bias appears.
• You can use hypothesis tests for null estimates for individual terms (t-tests in Parameter Estimates or F tests in Effect Tests).
• You can use information theoretic criteria such as AICc or BIC about the whole model.
• You can use either of these kinds of information in a manual selection process by
• clicking Run and working with Fit Least Squares platform or
• in an automated selection process by changing the personality from Standard Least Squares to Stepwise and then clicking Run.

You can then use a variety of ways to predict the response but the Prediction Profiler is probably the best.

If the linear model is in inadequate, then there are other models to choose from.

Please see Help > Books > Fitting Linear Models.

We also provide training courses that cover this subject, which is not easily taught through a simple discussion here.

Learn it once, use it forever!

Community Trekker

Joined:

Feb 17, 2017

Hi,

Could you please help me to determine at the attached jmp file to obtain the most suitable equation determination between Time/Result interaction?

Staff

Joined:

Jun 23, 2011

(Please note that I included a picture of the initial Prediction Profiler in the first post by mistake. I replaced it with the updated Prediction Profiler that reflects the change in the Time to 6.)

Assuming that Batch is a random effect and ignoring it for the purpose of this model, I used Fit Curve to explore plausible candidate models. I included a few exponential decay models in addition to the polynomial models that you explored. Here is the ranking by AICc:

The best choice is the three-parameter exponential model and the quadratic is a close second choice. In fact, the first five models are within 4 AICc, so that they are all supported by this data. But we prefer parsimony over complexity. The first two choices seem reasonable given a plot of the result over time for all batches. The complexity of the other candidate models seems unwarranted.

The choice now, it seems to me, is whether you need extrapolation. The quadratic model is turning up while the exponential model is approaching an asymptote.

My preference is the exponential model in this case so I will use it to demonstrate the exploitation of the model. The same process works with any of these models.

1. Click the red triangle next to select model (Exponential 3P in this example).
2. Select Profiler.
3. Change the Time(day) to the desired level of 6 by clicking on or dragging the Result trace or entering the value directly as the current value (red) in the scale.

The prediction is updated:

So your estimated Result at Time = 6 days is 98.46303, or 96.495 to 100.431 with 95% confidence.

Learn it once, use it forever!

Community Trekker

Joined:

Nov 9, 2016

I would also think that the "physics" of the processes generating the result measurement over time would also be a factor in choosing which model is most suitable. So, if for instance the decrement possibility per unit time per unit measure of result is constant, the exponential function might apply.

Community Trekker

Joined:

Feb 17, 2017

When ı am selecting to most suitable curve (equation) for obtained results, I try all fit options and then look their Rsquare and RMSE values. If R2 value higher and RMSE value is lower it is most suitable equation for me. Therefore ı choose Quintic curve which is highlighted with yellow color on picture. But as ı understand from your comments, you decide the most suitable equation from AIC values? Am i right?

Is there a way deciding to most suitable curve for my data set? Which model is the most suit to my results?

Staff

Joined:

Jun 23, 2011

Yes, you should not use R square or RMSE for model selection. Yes, you should either use a criterion such as AICc or use an honest assessment method like cross-validation.

Cross-validation uses two or three exclusive partitions of the data (hold out sets). The first set is used to train (fit) the model. The second set is used to validate the model selection. The optional third set is used to test the selected model. You can either designate the validation sets according to your own partition scheme or use K-fold cross-validation. You can take K-folds to the extreme and use leave one out validation.

I think that AICc is more than sufficient in your simple case of a single predictor (time).

So, yes, there are many ways to decide what is the most suitable curve for your data set. I also mentioned, and then another respondant re-iterated, that a theoretical model is usually the best when it is availble. The choice of the criterion for model selection and the selection of the best curve is up to you.

There is a reason that we have many criteria. There is a reason that JMP orders the candidates models by AICc.

Learn it once, use it forever!