cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
JMP is taking Discovery online, April 16 and 18. Register today and join us for interactive sessions featuring popular presentation topics, networking, and discussions with the experts.
Choose Language Hide Translation Bar
Exploring Model Classification Thresholds
Tarek_Zikry
Staff (Retired)

These are instructions on how to download and run the Model Classification Explorer Add-In I created as a JMP Intern this summer with @KarenC and @mia_stephensThis add-in provides a unified dashboard to visualize model cutoffs and error trade-offs simultaneously. You can interactively change a model threshold and immediately see the results propagated in performance measures, confusion matrices, and ROC curves.

 

NOTE: This may be slow or freeze up with very large data sets.

 

Step 1: Download the attached .jmpaddin file and example data table.

 

Data Selection DialogData Selection Dialog

Step 1: Cast the Prob[Donut] to X and the Consumption to Y. These two columns are required.

 

Step 2: Set your target level to “Donut,” as that’s what the model is predicting.

 

Step 3: Set your alpha level for the statistical analysis, which is left to 0.05 on default.

 

Step 4: You have the option in the top right to choose the Performance Terminology based on a given application field, which is a purely visual option for the labels of the different measures. In this case, we’ll leave it on “General.”

 

Step 5: The visual accessibility check box is a feature to change graphical output to be able to interpret results without needing to distinguish colors on the graphs, which we’ll leave off for now.

 

Step 6: With all these initial parameters set, click “OK” to launch the platform. Depending on the size of your dataset and whether or not you used a validation column, this may take a few seconds to launch.

 

Interactively Exploring CutoffsInteractively Exploring Cutoffs

Let me know if you have any comments or suggestions! 

 

Comments

This add-in is a great additioni to JMP. I'm running 14.1 and it works fine on the Doughnut data. However, it locks up JMP and then crashes on almost any other dataset I chose. Any suggestions as to what may be going wrong?
Thanks.
Steve Powell

KarenC

Hi Steve,

 

Yes, it turns out that if your dataset is too large (try N < 1000) then the add-in may "freeze" JMP. I haven't had a crash, just that JMP locks up as you experienced.  We are looking into this perfromance issue. In the meantime, try a random subset of your data of interest to at least get started on exploring thresholds.  Stay tuned!

 

Thanks,

Karen

marxx

Thanks for sharing Tarik and the followup Karen, I will be interestred also in this working with larger data sets when that is available.

 

Thanks, marxx

KarenC

Marxx,

 

If you want to try it on a larger data set try running it when you are going to be away from your computer for a bit and then see if has run when you return. I have found that if I wait, it will run. It just has some inital "work" to do to get going.


Karen

 

marxx

Wonderful! I am often looking for a way to dart away from the screen, so will follow your advice and cruise around while I let it spin for a bit.

 

Much appreciated, marxx

tsnow_

I have tried this several times, it crashes every time. Any alternative?

KarenC

Hello, 

I am not sure why you are having trouble, can you provide more information?  What version of JMP are you using? Mac or windows? Is it crashing or hanging? How many rows in your data table?

Thanks

tsnow_

Hi KarenC. I am using JMP 14. I am using windows. I have 11,221 rows. A message that says 'not responding' pops out. Kindly assist.

Thanks

KarenC

Hi  tsnow,

 

My guess is that the app is having trouble with the amount of data. We have observed issues with a large number of rows, exactly how many would likely depend on machine configuration.  However, all is not lost. With that size of data table I think you could take a random sample (subset it into another table) and then run the app.  I would start with say 10% and see if the app runs. If it does then I would look at 4 - 5 different random samples of 10% and see if you are "landing in a similar spot" regardless of sample. If so you can probabaly answer your practical question.  If not then I would etiher run a few more samlpes and/or try a bigger sample 20%, 30%, until you run into the size issue again.


Hope this helps.  You could also build the score vs. state plot with all of your data in Graph Builder as a starting place of understanding the distriubtions and where a cut-point might make sense for your problem.


Karen

tsnow_

Thank you Karen! It was a success! Again, many thanks!

Lu

Tried the Addin on my data but get always this Error message: Subscript Range in access or evaluation of assay.

 

 

KarenC

@Lu 

I am not sure why you are getting that error. I you click on the Help button (right side) in the add-in you will get a user's manual. Otherwise without more information it is hard to know why you are having trouble. Can you run the donuts example data?


Karen

Lu

Yes, I can run the donuts example without any problem.

KarenC

If your "truth" column is not character, try changing it to character type.

Lu

Is there a method in the Addin to automatically optimize the classification threshold (= minimize the misclassification rate)?

KarenC

Hopefully you noticed the "Minimize Misclassification" button...it is in the Set Probability Threshold outline.

Anne_S

Dear Karen,

thanks for the helpful addin! I´m trying to use it to calculate sensitivity, specificity, PPV and NPV.

While the numbers I get are correct (compared to reference sensitivity / specificity data I have), it´s interchanging sensitivity with specificity and PPV with NPV.

I suspect this may have something to do with the fact that the results that I have specified as positive in my response column are below a defined threshold, not above one. Do you have some advice on how to fix this?

KarenC

Flip what you are calling the "Target".

Lu

That button would be nice to have. In my version of the Add-in I find no "minimize Misclassification Threshold". Should I change some lay-out settings?

 
 
 
 
 

 

KarenC

There is the option to set the target level in the launch window.

Anne_S

I flipped the target, but because my samples qualify as positive below a certain threshold, the result is that, for example, where the specificity truly is 96.3 %, I get a result of 3.7 %.

In the version before the target flip, all my values were correctly classified as true positive, false negative etc., just the sensitivity & specificity values were reversed. After the flip, every true positive is classified as a false positive, every false negative as true negative and so on...

KarenC

Hi Anne, I am not exactly sure...if you have the right TP, TN, etc then you should have the right Se and Sp. Look at the 2x2 tables carefully to make sure you are reading the labels correctly. Agreed that the tool is going to use values to the right (greater than) the threshold as positive.  You might also look at my IVD Performance add-in that will give performance based on two nominal columns (test and truth) where you can define what is positive for each.  Maybe that will work better once you have your threshold fixed (also runs much faster).  If you still have questions keep asking.


Karen

Anne_S

Dear Karen,

I re-checked the 2x2 tables, and they are correct. You´re right, I can use the IVD performance addin instead, but that means that I have to tag all my data as either positive (below the threshold) or negative (above) and analyze the tag results rather than the original numbers, so I don´t get the nice "state vs. score" visuals of this script.

state vs score.PNG

But I´ll use the IVD performance addin, it does give me the numbers that I need most.

Many thanks for your help!