Share your ideas for the JMP Scripting Unsession at Discovery Summit by September 17th. We hope to see you there!
Choose Language Hide Translation Bar
Highlighted
marta-simoes
Level II

Significant model with significant LoF

Dear JMPers,

 

I think I have a challenge! :)

 

It is about a significant model with significant LoF results. I've tried everything suggested in JMP community, adding interactions, higher order terms and excluding some results (potential outliers) to correct LoF. The thing is: X1, X2 and X3 are highly correlated. I would prefer to keep these 3 factors at the end of the model to reach a design space to work within, but this is not a must.

 

So I've ended up with 2 models (multiple linear regression):

- the first one simpler, no interactions - only X3 was kept, VIFs are low but still LoF significant for Y1, Y2 and Y3

- the second with interactions - all X1, X2 and X3 were kept in the equations, with significant interactions. Y1 has non-significant LoF, but not the same for Y2 and Y3... Moreover, VIFs increase a lot in Y3, and the profiler is not brilliant...  

 

Note: Runs are not from a DoE (it was empirical).

 

So generally speaking, the models (standard least squares) look quite good to me. But it seems that I need to reach a compromise between LoF+normality of residuals OR low VIFs+simplicity of the model. 

 

Am I missing anything here? Can I improve my models and correct LoF results?

 

Thanks in advance for your advice,

Marta

Marta Simões
1 ACCEPTED SOLUTION

Accepted Solutions
Highlighted

Re: Significant model with significant LoF

Some of the issues can be avoided by not using a linear predictor and regression. Here is the result from a simple neural network model (5 hidden nodes in one hidden layer all using the hyperbolic tangent activation function):

 

Capture.PNG

 

The model predictions seem reasonably accurate:

 

Capture.PNG

 

The model seems to profile well:

 

Capture.PNG

Learn it once, use it forever!

View solution in original post

5 REPLIES 5
Highlighted

Re: Significant model with significant LoF

Interesting data.

 

Capture.PNG

 

You appear to have some very highly influential observations along with high collinearity. Both of those issues are trouble for multiple regression.

 

Capture.PNG

 

You don't have that much independent information.

 

The collinearity makes it very difficult to determine significance.

 

Capture.PNG

Capture.PNG

 

The very low leverage means that very small changes in data values can produce very large changes in the statistics (e.g., parameter estimates and the t-ratios.)

Learn it once, use it forever!
Highlighted

Re: Significant model with significant LoF

Some of the issues can be avoided by not using a linear predictor and regression. Here is the result from a simple neural network model (5 hidden nodes in one hidden layer all using the hyperbolic tangent activation function):

 

Capture.PNG

 

The model predictions seem reasonably accurate:

 

Capture.PNG

 

The model seems to profile well:

 

Capture.PNG

Learn it once, use it forever!

View solution in original post

Highlighted
marta-simoes
Level II

Re: Significant model with significant LoF

Hello @markbailey ! Thanks for your valuable suggestion! In fact, I've never considered a neural network. I believe you are right, it seems the perfect solution for this case.

Anyway, when I try to reproduce in my JMP, I always get a positive slope for X2 and not a negative as yours (and I know the correct should be negative!). Any suggestion for this? 

 

 

Marta Simões
Highlighted

Re: Significant model with significant LoF

I can't be sure what might cause this reversal of slope but I have some ideas. There are three things to consider. First of all, the assignment of observations to the training and validation sets is random, so that process will cause the results to vary, although the differences are usually small if the data set is very large.

 

Second, your data set is not very large! You can imagine if 1/3 of the data is randomly held out for validation each time, then the results can't be the same. In addition, some of your variables had only two or three levels. Some of the levels occurred only a few times. That fact will further complicate the differences between runs. You might try K-fold Cross-validation instead of Holdout for the Validation Method. You can use 5 to 10 folds for good results with this validation method.

 

Third, the impact of collinearity cannot be completely avoided. X2 is highly correlated with X1 and X3. Given the holdout validation method, the random subset of 2/3 of the rows could change the slope. 

 

Model building is serious, careful work. My previous reply was a suggestion, not a full consultation for a fee. (That is not the purpose of the discussions here!) You need to learn about any modeling technique before you trust and share your findings. But I hope this discussion is enough to get you started.

Learn it once, use it forever!
Highlighted
marta-simoes
Level II

Re: Significant model with significant LoF

Indeed, it was useful. I was never very keen on neural networks, because it is hard to really understand what's behind, but now I am curious. I have to study more and get deeper into this. I also have new results coming out soon, I can try to predict them, compare and verify if it really works. Have a nice weekend!

 

Marta Simões
Article Labels

    There are no labels assigned to this post.