When you fit a classification model in JMP, the probability cutoff for classification is 0.50. In this video, we see how to change the cutoff for classification using a formula column in the data table.
To apply a different cutoff, you can save the probability formula to the data table and then create a conditional formula column to specify the new cutoff.
Let's see how to change the cutoff for classification using the Impurity data. To make the model classify more observations as Fail, we change the cutoff for classification as Fail to 0.10.
But note that this is an arbitrary value. The cutoff value you use should be based on the context of the problem you're working on, the data structure, and other considerations.
We start by fitting a predictive model. We fit a logistic regression model, and then select Save Probability Formula from the top red triangle to save the logistic model to the data table.
This adds four new columns to the data table: the Logit (which is the logistic model), the predicted probability of Fail and Pass, and the most likely outcome based on the cutoff of 0.50.
The two probability columns are grouped in the column list.
We add a new column, and name this column [Cutoff Prob Fail = 0.10]. Then we right-click and select Formula to enter the Formula Editor.
Here are the steps to create the formula we need:
From the function list, select Conditional, and then If, to create an IF statement.
With the expr box highlighted, select the Prob[Fail] column.
Then, from the function list, select Comparison, and then a <= b, and enter 0.10 in the box provided.
In the first ELSE clause, type "Pass".
In the second ELSE clause, type "Fail".
This formula classifies observations with a predicted probability of Fail greater than 0.10 as a fail, and any observation with a predicted probability of Fail less than or equal to 0.10 as a pass.
Let's compare the classifications using these two cutoff values. The predicted probability of Fail for the observation in the sixth row is 0.12. This was originally classified as a pass.
By changing the cutoff to 0.10, more observations will be classified as fails. The model is more sensitive to detecting failures. But there is a tradeoff. The model will also incorrectly classify more passes as fails.