Scholarly record
APPLICATION OF MACHINE LEARNING FOR CHURN PREDICTION BASED ON TRANSACTIONAL DATA (RFM ANALYSIS)
Abstract
Machine learning covers a wide set of supervised and unsupervised algorithms for solving prediction, classification and anomaly detection problems. One of the areas of their applications is for customer churn prediction. To build a model for predicting the switching of customers, data scientists use different demographics, social, transactional, behavioural metrics and features. At the same time, most of the small Bulgarian companies still don?t have the needed versatile and complete customer data. They rely mainly on information provided by the ERP system that generates mostly transactional oriented data. Small and medium sized enterprises at this stage are not planning major investments in marketing research and additional customer related sources, and are limited to perform modelling and forecasting on transactional data. The main goal of the current study is to propose a combination of RFM analysis and machine learning algorithms for churn prediction based on mainly transactional data. The dataset is extracted from ERP system of a regional concrete production company in Bulgaria. RFM scores are calculated for every customer for a period of 6 months before the end date of examination. The target value for prediction models is a churn metric indicating whether the customer has made a transaction in the next 6 months following the RFM analysis or not. Several machine learning algorithms has been applied such as Two-Class Boosted Decision Trees, Two-Class Neural Networks, Two-Class Decision Jungle, Two-Class SVM and Two-Class Logistic Regression. The experiments were performed in Azure Machine Learning Studio. Results showed that despite the limitations of RFM scores and metrics by using machine learning algorithms companies can predict with enough confidence the churning of their customers. The best model for churn prediction proved to be Two-Class Decision Jungle, Two-Class Boosted Decision Trees and Two-Class Neural Networks. There are no notable differences when using recency, frequency and monetary values instead their scores (R, F, M and RFM).
Publication Impact Profile
Publication details
References0
Structured references will appear here after the reference import pass. The count is preserved now so the scholarly record is not incomplete.
Citing literature
Number of times cited according to Crossref: 9
View or Download full articleAccess options
SWS access login
Login as SWS Scientific CommitteeLogin as SWS Scientific PartnerLogin as SWS AuthorAuthors and approved SWS contributors will read and export their own linked papers after identity matching by SWS profile, email and SGEM GlobalID.
For librarian assistance: [email protected]
Purchase Instant Access
- Article can be downloaded after successful payment.
- Article may be used according to SWS library access terms.
- Article cannot be redistributed.

