Level: Intermediate
Shankang Qu, Statistician, PepsiCo
Mumu Wang, Senior Scientist/Statistician, PepsiCo
This study is intended to evaluate customer reviews and purchasing behavior using supervised learning sentiment analysis. The verbatim (i.e., textual data) collected from consumer feedback and comments on beverage products is classified into categories such as complaint, praise and suggestion. Through the text mining models created with JMP Pro, we demonstrated the feasibility of a hybrid algorithm implementation where hand-built classifiers are combined with empirical learning from the data. An automated system will connect the algorithms to incoming voice-recorded feedback with the cumulated verbatim and classify the document into one of the three categories. The algorithms adapt in response to new data and experiences to improve prediction quality over time. Our service associates manually coded 7,507 documents from consumers through phone, email, social media, chat and e-commerce. In this presentation, we will show how to build text mining models using approaches such as neural networks. In the confusion matrix, we achieved a 6% misclassification rate in training and 11% in validation. Slightly higher rates were obtained by using bootstrap forest and nominal logistic models. Comparing the modeling results and the original coding on the verbatim, we also found some data entry issues. The models have been trained to capture documents that were misclassified by humans.