BookmarkSubscribe
Choose Language Hide Translation Bar
liugeng0323
New Contributor

JMP R suare in Predictive and Specialized Modeling is wrong.

I built a neural network model with JMP. It outputs an R square value. It outputs an R square value,like this:

HRKR0VY~82[~6~R($RHQ(SW.png

but when I output predicted value and test the R square in Python, I found it's wrong,like this:

%I4O[$}}X$61GTUAX_W35H3.png

this is validation dataset R square I calculated using  SKlearn.

0.63 and 0.54, it's a big difference.

Any staff can answer my question?

contact me: 531554559@qq.com

urgent

0 Kudos
8 REPLIES 8
liugeng0323
New Contributor

Re: JMP R suare in Predictive and Specialized Modeling is wrong.

The calculated formula of R square in python is this:

8Y@KT(87YLTD2J$UE`($3RN.png

so I really confused what formula of R square in JMP is?

what is the problem?

Actually as the document in JMP:

 

0 Kudos

Re: JMP R suare in Predictive and Specialized Modeling is wrong.

Not knowing the data at all, I would add to the other suggestions that the 70%-20%-10% allocation scheme might provide an insufficient sample for validation in this case, especially if you have rare targets. I think that different assignments to the hold out sets is more likely, but just adding something else that might help to explain it.

Learn it once, use it forever!
0 Kudos
liugeng0323
New Contributor

Re: JMP R suare in Predictive and Specialized Modeling is wrong.

The calculated formula of R square in python is this:

8Y@KT(87YLTD2J$UE`($3RN.png

so I really confused what formula of R square in JMP is?

what is the problem?

Actually as the document in JMP:

JDYR``~)$TPYH~TNV1M087J.png

I guess it's the same formula,but different results.

so I want to confirm what the problem is 

0 Kudos
gzmorgan0
Super User

Re: JMP R suare in Predictive and Specialized Modeling is wrong.

Not having access to your data, etc., I can only suggest several items to check that might account for the difference:

  • Verify that you are using the same observations in both applications:JMP and Python.
  • Verify that no rows were excluded in JMP that might not be excluded in Python.
  • Verify no round off in units. For example, when people use Excel they get a result, then they copy and paste data from Excel into JMP and get a different result. When looking for the source of discrepancy, the actual values were not copied into JMP, the displayed (formatted) values, hence, rounded values were used in JMP.

This is where I would start.  Then I would do the calculations using the JMP table.

 

Please post, if you find the source of discrepancy.

dale_lehman
Community Trekker

Re: JMP R suare in Predictive and Specialized Modeling is wrong.

One obvious thing to check would be whether the validation and test data sets are the same (I doubt that they are, since they are created using a random seed and I'm not sure if R and JMP use the random seeds in the same way).  If the random division of the data into training, validation, and test sets is different, then of course the R square values will differ.  I suspect this may be the case - also because of the large difference between the fit in the training and validation/test data sets.  That difference suggests substantial over-fitting, and if that is the case, then the results may well differ considerably with different divisions of the data into the three subsets.

Re: JMP R suare in Predictive and Specialized Modeling is wrong.

How do the results change if you repeat fitting the Neural model in JMP?

 

Here are two back-to-back fits with the same data and same allocation to training and validation and no change to the network:

 

Capture.PNG

Learn it once, use it forever!

Re: JMP R suare in Predictive and Specialized Modeling is wrong.

I agree with the random seed as one source of variation. Another source might come simply by the fact that the transfer functions might differ. Have you checked that Python has the same amount of hidden layers, same type and same amount of transfer functions? In JMP you have tanh and one layer only (JMP Pro offers a second hidden layer as well gaussian and normal transfer functions in addition to tanh) Note in python you usually would specify number of output/input nodes and number of hidden layers as well what transfer/activation function you want to use. Can you be sure this is all the same?

If there is just one thing different there might be a different result for the RSquare. 

 

Another note: What is the goal of your analysis? Do you want to do a regression or a classification? If the latter you shouldn't care tooo much about the rsquare, but on the misclassification rate, ROC curve or similar measurement more useful than rsquared.

 

 

Re: JMP R suare in Predictive and Specialized Modeling is wrong.

As Mark Bailey pointed out in his example, the neural platform in JMP will randomize the starting values in the search for the optimal weights. This will occur even on subsequent runs with a specified random seed because the random seed only applies to the random partitioning between training/validation/test sets and not to the search starting values.

 

Therefore, different model evaluations under identical conditions can lead to different fit statistics, which is true for any numeric algorithm that needs to search for an optimal solution.

Dan Obermiller