This issue has been discussed previously, but I keep finding myself confused about how to use the profit matrix. My question is conceptual in nature, so let's just use some simple numbers. Suppose I have data on customer churn and I build a predictive model for churn (the particular technique does not matter for this question). My decision errors are not symmetric, however, so I want to modify my probability cutoff to reduce the more costly misclassification error. Suppose the relevant data is:

- I will send every predicted lost customer a $50 gift.
- I expect 10% of these customers to be retained as a result of the gift.
- Every retained customer has an expected lifetime present value of profits of $1000.

From this, I know that falsely predicting a customer will be retained (false positive) costs me more than falsely predicting I will lose a customer (false negative). However, I can't decide how to operationalize this. The profit matrix would seem a natural place to incorporate this information, but the example in the JMP documentation does not easily translate to this problem. My inclination is to use a profit matrix that look like

predicted churn=1 predicted churn=0

actual churn = 1 -50 -100

actual churn = 0 +50 +1000

My reasoning would be that I will send a $50 gift to all predicted churners, but that 10% of these will actually be retained (expected profits = 10%(1000)=$100). However, the model prediction concerns the classifications of the initial model, not the results of my marketing efforts. In other words, the actual churn=0 is not the result of my gift, it is the misclassifications of the initial model.

Similarly, the -100 in the matrix results from the fact that if I predict the customer to be retained and they are not, I don't send the gift (saving $50) but I miss the opportunity to have retained 10% of these customers (expected profits of $100).

The fact that the misclassification errors are not symmetric is clear - but the way to implement this is not. Even if I don't use the profit matrix, it isn't clear to me how to choose a probability cutoff in my classification model that incorporates the simplified data I assumed above. I realize that the classification model would really be the first step in a process whereby I would want to experiment with ways to increase retention. But the assumptions I make about the gifts, probability of success, and expected profits should provide enough information to use in applying the classification model to this problem.

Can anyone help sort out how to incorporate the assumed information into choosing a probability cutoff for a classification model?

Thanks.