These are instructions on how to download and run the Model Classification Explorer Add-In I created as a JMP Intern this summer with @KarenC and @mia_stephens. This add-in provides a unified dashboard to visualize model cutoffs and error trade-offs simultaneously. You can interactively change a model threshold and immediately see the results propagated in performance measures, confusion matrices, and ROC curves.
NOTE: This may be slow or freeze up with very large data sets.
Step 1: Download the attached .jmpaddin file and example data table.
Step 1: Cast the Prob[Donut] to X and the Consumption to Y. These two columns are required.
Step 2: Set your target level to “Donut,” as that’s what the model is predicting.
Step 3: Set your alpha level for the statistical analysis, which is left to 0.05 on default.
Step 4: You have the option in the top right to choose the Performance Terminology based on a given application field, which is a purely visual option for the labels of the different measures. In this case, we’ll leave it on “General.”
Step 5: The visual accessibility check box is a feature to change graphical output to be able to interpret results without needing to distinguish colors on the graphs, which we’ll leave off for now.
Step 6: With all these initial parameters set, click “OK” to launch the platform. Depending on the size of your dataset and whether or not you used a validation column, this may take a few seconds to launch.
Let me know if you have any comments or suggestions!
This add-in is a great additioni to JMP. I'm running 14.1 and it works fine on the Doughnut data. However, it locks up JMP and then crashes on almost any other dataset I chose. Any suggestions as to what may be going wrong?
Thanks.
Steve Powell
Hi Steve,
Yes, it turns out that if your dataset is too large (try N < 1000) then the add-in may "freeze" JMP. I haven't had a crash, just that JMP locks up as you experienced. We are looking into this perfromance issue. In the meantime, try a random subset of your data of interest to at least get started on exploring thresholds. Stay tuned!
Thanks,
Karen
Thanks for sharing Tarik and the followup Karen, I will be interestred also in this working with larger data sets when that is available.
Thanks, marxx
Marxx,
If you want to try it on a larger data set try running it when you are going to be away from your computer for a bit and then see if has run when you return. I have found that if I wait, it will run. It just has some inital "work" to do to get going.
Karen
Wonderful! I am often looking for a way to dart away from the screen, so will follow your advice and cruise around while I let it spin for a bit.
Much appreciated, marxx
I have tried this several times, it crashes every time. Any alternative?
Hello,
I am not sure why you are having trouble, can you provide more information? What version of JMP are you using? Mac or windows? Is it crashing or hanging? How many rows in your data table?
Thanks
Hi KarenC. I am using JMP 14. I am using windows. I have 11,221 rows. A message that says 'not responding' pops out. Kindly assist.
Thanks
Hi tsnow,
My guess is that the app is having trouble with the amount of data. We have observed issues with a large number of rows, exactly how many would likely depend on machine configuration. However, all is not lost. With that size of data table I think you could take a random sample (subset it into another table) and then run the app. I would start with say 10% and see if the app runs. If it does then I would look at 4 - 5 different random samples of 10% and see if you are "landing in a similar spot" regardless of sample. If so you can probabaly answer your practical question. If not then I would etiher run a few more samlpes and/or try a bigger sample 20%, 30%, until you run into the size issue again.
Hope this helps. You could also build the score vs. state plot with all of your data in Graph Builder as a starting place of understanding the distriubtions and where a cut-point might make sense for your problem.
Karen
Thank you Karen! It was a success! Again, many thanks!
Tried the Addin on my data but get always this Error message: Subscript Range in access or evaluation of assay.
I am not sure why you are getting that error. I you click on the Help button (right side) in the add-in you will get a user's manual. Otherwise without more information it is hard to know why you are having trouble. Can you run the donuts example data?
Karen
Yes, I can run the donuts example without any problem.
If your "truth" column is not character, try changing it to character type.
Is there a method in the Addin to automatically optimize the classification threshold (= minimize the misclassification rate)?
Hopefully you noticed the "Minimize Misclassification" button...it is in the Set Probability Threshold outline.
Dear Karen,
thanks for the helpful addin! I´m trying to use it to calculate sensitivity, specificity, PPV and NPV.
While the numbers I get are correct (compared to reference sensitivity / specificity data I have), it´s interchanging sensitivity with specificity and PPV with NPV.
I suspect this may have something to do with the fact that the results that I have specified as positive in my response column are below a defined threshold, not above one. Do you have some advice on how to fix this?
Flip what you are calling the "Target".
That button would be nice to have. In my version of the Add-in I find no "minimize Misclassification Threshold". Should I change some lay-out settings?
There is the option to set the target level in the launch window.
I flipped the target, but because my samples qualify as positive below a certain threshold, the result is that, for example, where the specificity truly is 96.3 %, I get a result of 3.7 %.
In the version before the target flip, all my values were correctly classified as true positive, false negative etc., just the sensitivity & specificity values were reversed. After the flip, every true positive is classified as a false positive, every false negative as true negative and so on...
Hi Anne, I am not exactly sure...if you have the right TP, TN, etc then you should have the right Se and Sp. Look at the 2x2 tables carefully to make sure you are reading the labels correctly. Agreed that the tool is going to use values to the right (greater than) the threshold as positive. You might also look at my IVD Performance add-in that will give performance based on two nominal columns (test and truth) where you can define what is positive for each. Maybe that will work better once you have your threshold fixed (also runs much faster). If you still have questions keep asking.
Karen
Dear Karen,
I re-checked the 2x2 tables, and they are correct. You´re right, I can use the IVD performance addin instead, but that means that I have to tag all my data as either positive (below the threshold) or negative (above) and analyze the tag results rather than the original numbers, so I don´t get the nice "state vs. score" visuals of this script.
But I´ll use the IVD performance addin, it does give me the numbers that I need most.
Many thanks for your help!