Predict Customer Churn in Telecom Company (2020-US-EPO-622)
Kamal Kannan Krishnan, Graduate Student, University of Connecticut
Ayush Kumar, Graduate Student, University of Connecticut
Namita Singh, Graduate Student, University of Connecticut
Jimmy Joseph, Graduate Student, University of Connecticut
Today all service industries, including the telecom face a major challenge with customer churn, as customers switch to alternate providers due to various reasons such as competitors offering lower cost, combo services and marketing promotions.
With the power of existing data and previous history of churned customers, if company can predict in advance the likely customers who may churn voluntarily, it can proactively take action to retain them by offering discounts, combo offers etc, as the cost of retaining an existing customer is less than acquiring a new one. The company can also internally study any possible operational issues and upgrade their technology and service offering. Such actions will prevent the loss of revenue and will improve the ranking among the industry peers in terms of number of active customers.
Analysis is done on the available dataset to identify important variables needed to predict customer churn and individual models are built. The different combination of models is ensembled, to average and eliminate the shortcomings of individual models. The cost of misclassified prediction (for False Positive and False Negative) is estimated by putting a dollar value based on Revenue Per User information and cost of discount provided to retain the customer.
Speaker | Transcript |
Namita | Hello everyone I'm Namita, and I'm here with my teammates Ayush, Jimmy and Kamal from University of Connecticut to present our analysis on predicting telecom churn using JMP. |
The data we have chosen is from industry that keeps us all connected, that is the telecom and internet service industry. So let's begin with a brief on the background. | |
The US telecom industry continues to witness intense competition and low customer stickiness due multiple reasons like lower cost, combo promotional offers, and service quality. So | |
to align to the main objective of preventing churn, telecom companies often use customer attrition analysis as their key business insights. | |
This is due to the fact that cost of retaining an existing customer is far less than acquiring a new one. Moving on to the objective, | |
the main goal here is to predict in advance the potential customers who may attrite. And then based on analysis of that data ,recommend customized product strategies to business. | |
We have followed the standard SEMMA approach here. Now let's get an overview of the data set. It consists of total 7,043 rows of customers belonging to different demographics (single, with dependents, and senior) | |
and subscribing to different product offerings like internet service, phone lines, streaming TV, streaming movies and online security. | |
There are about 20 independent variables; out of it, 17 are categorical and three are continuous. The dependent target variable for classification is customer churn. | |
And the churn rate for baseline model is around 26.5%. Goal is now to pre process this data and model it for future analysis. That's it from my end over to you, Ayush. | |
Ayush Kumar | Thanks, Namita. I'm Ayush. In this section, I'll be talking about the data exploration and pre processing. |
In data exploration, we discovered interesting relationships, for instance, variables tenure and monthly charges both were positively correlated to total charges. These three variables we analyzed | |
using scatter plot matrix in JMP, which validated the relationship. Moreover, by using explore missing values functionality, we observed that total charges column had 11 missing values. | |
The missing values were taken care of as a total charges column was excluded due to multicollinearity. After observing the histograms of the variables using exclude outlier functionality, we concluded that the data set had no outliers. | |
The variable called Customer ID had 7,043 unique values which would not add any significance to the target variable. So customer ID was excluded. | |
We were also able to find interesting pattern among the variables. Variables such a streaming TV and streaming movies convey the same information about the streaming behavior. | |
These variables were grouped into a single column streaming to by using our formula in JMP. The same course of action was taken for the variables online backup and online security. | |
We ran logistic regression and decision tree in JMP to find out the important variables. | |
From the effects summary, it was observed that tenure, contract type, monthly charges, streaming to, multiple line service, and payment method showed significant log worth and very important variables in determining the target. | |
The effects on ??? also helped us to narrow down a variable count to 12 statistically significant variables, which formed the basis for further modeling. | |
We use value of ??? functionality and moved Yes of our target variable upwards. Finally, the data was split into training validation and test in 16 20 20 ratio using formula random method. Over to you now, Kamal. | |
Kamal Krishnan | Sorry, I am Kamal. I will explain more about the different models built in JMP using the data set. |
We in total built eight different types of model. On each type of model, we tried various input configuration and settings to improve the results of mainly sensitivity. | |
As our target was to reduce the number of false negatives in the classification. | |
JMP is very user friendly to redo the models by changing the configurations. It was easy to store the results whenever a new iteration of the model is done in JMP | |
and then compare outputs in order to select the optimized model from each type. JMP allowed us to even change the cutoff values from default 0.5 to others and observed the prediction results. This slide shows the results of selected model from eight different type of models. | |
First, as our top target variable journeys categorical we built logistic regression. Then we build decision tree, KNN, ensemble models like Bootstrap forest and boosted tree. | |
Then we built machine learning models like neural networks. JMP allowed us to set the random seed in models like neural networks and KNN. This helped us to get the same outputs we needed. | |
Then we built naive Bayes model. JMP allowed us to study the impact of various variables through prediction profiler. | |
We can point and click on to change the values in the range and see how it impacts the target variable. By changing the prediction profiler in naive bayes, we observed that increase in tenure period helps in reducing the churn rate. On the contrary, | |
increase in monthly charges increases the churn rate. Finally, we did ensembel of different combination of models to average and eliminate the shortcomings of individual models. | |
We found that in ensembling neural network and naive bayes has higher sensitivity among ???. This ends the model description. Over to you, Jimmy. | |
JJoseph | Thank you, Kamal. In this section we will be comparing the models and looking deeper dive into each model detail. |
The major parameters used to compare the models are cost of misclassification in dollars, sensitivity versus accuracy chart, lift ratio, and area under the curve values. | |
The cost of misclassification data is depicted on the right, top corner of the slide. Cost of false positives and false negative determined using average monthly charges. | |
That cost of false negative model predicted no turn for customer potentially leaving, calculated to dollar (85) and cost of false negative | |
at dollar (14) after discounting 20% to accommodate additional benefits. The cost comparison chart clearly indicate that the niave bayes has the lowest cost. | |
Going on to total accuracy rates chart with it is between 74 to 81%, not much variation in most of the models. And lift, a measure of probability to find a success record compared to baseline model, varies between 1.99 to 3.11. | |
The AUC or ROC curve is another measure us to determine the strength of the model with different type of values. As chart indicates all the models did equally well in this category. | |
The sensitivity and accuracy chart measure the models' success to predict the customer churn accurately. The chart indicates two facts | |
How many customers that the model can correctly predict; to how often the prediction be accurate. | |
This measure is used as the major parameter to decide the best performing model and naive bayes did well in this category. | |
Based on the various metrics and considering the cost of failed prediction of models, naive bayes came out as the best and parsimonious model to predict the customer churn for the given data set. | |
It has lowest misclassification ratio, high sensitivity, and reasonably good total accuracy. If you discount some of its inherent drawbacks, such as lack of a statistical model to support, the model is completely data driven and easily explainable. | |
Moving on to the conclusions drawn, the significant variables in the data set are contract and tenure of customer enrolled. | |
From modeling, we observed that churning of customer is high for | |
1) those without dependent in demography; 2) those who pay a high price for their phone services, low customer satisfaction rate on high end services; | |
3) customers stick to the original single line on service easy switch over to competitors. So based on those findings, the recommendations are | |
1) targeted customer promotion focused on in income generation; 2) push long term contract with additional incentives; 3) build a product line combo focusing on customer needs. | |
In conclusion, we use JMP tool to do analysis and predictive models on limited data set. It is very effective and powerful to to do those analysis, please reach out to us if you have any further questions. Thank you. |