Elo — Predicting Customer Loyalty Score

10 min readNov 30, 2020

If we have sufficient data , with the help of Machine learning we can predict the behavioral pattern which is going to happen in future.

In this article we will explain how we can develop a machine learning model which is capable of predicting the customer loyalty while observing the past credit card transactions history. We will also discuss if the model can be productionized so that we can use it in real time.

Table of Contents:
1.Description
2. Dataset Source
3. Problem Statement
4. Mapping to ML/DL problem
5. EDA
6. Feature Engineering
7. Model Building
8. Deployment
9. Future work
10. Profile
11. References

1. Description

Nowadays many of us use credit card for transactions. As the name suggests using credit card one can perform transaction even if there is insufficient balance in the Bank Account and the same amount needs to be paid in the next billing cycle by the credit cardholder. Although The basic operation is same for all the credit cards still to retain customers and make use of credit cards more , credit card providers offer various discount coupons and promotions occasionally for the eligible customers. This also helps the merchants to expand the businesses where the customers can spend the money.

Elo is one of the largest payment brands in Brazil, has built partnerships with merchants in order to offer promotions or discounts to cardholders. Elo wants to build machine learning models to understand the most important aspects and preferences in their customers’ lifecycle, from food to shopping so that they can provide more relevant and personalized promotions to the cardholders.

2. Dataset Source:

https://www.kaggle.com/c/elo-merchant-category-recommendation/data

3. Problem Statement:

i) Predict loyalty score for a customer based on his/her purchasing history on credit card

ii) Predicting loyalty score would be useful as it will help the credit card provider to offer discount coupons and various promotional offers.

iii) This is useful as customer might spend more money to avail them which in returns expands the business.

4. Mapping to ML/DL problem:

As the title suggests, we need to predict the loyalty score for a customers based on past transactions. We can construct the problem in two ways, i)Classification, predict whether the customer is loyal or not. if loyal then more promotional offers or discount coupons can be given.
ii) Regression, predict the customer loyalty score in terms of a real value. The higher the score is, the more loyal the customer is.

4.1. Performance Metrics:
For the classification problem we will use ROC_AUC and for the regression one we will use RMSE as performance metric.

4.2.Train-Test Construction:
Even though Test dataset is given, still for cross validation purpose we will split the data in 70–30 ratio for evaluating and developing the model before predict on the actual test dataset.

4.3.First cut approaches:
Now, to solve the problem we will go step by step
i) EDA, to get the overview of the data and check if there is anomaly in the provided dataset
ii) Feature Engineering, if the existing features are not enough then we must go for this step to construct more features to increase the accuracy
iii) Model development
iv) And finally Deployment so that it can be accessible by anyone.

5. Exploratory Data Analysis:

The datasets are provided in terms of csv files . We have,
i) train , contains the ‘card_id’, three anonymized features — ‘feature_1', ‘feature_2’, ‘feature_3’ , ‘first_active_month’ and the target feature which is the loyalty score.
ii) historical_transactions, it contains up to 3 months’ worth of transactions for every card_id at any of the provided merchant_ids. There are total 14 features, some of them are ‘authorized_flag’ to indicate whether the transaction is authorized or not, ‘card_id’,’city_id’, ‘installments’ , ‘purchase_date’, normalized ‘purchase_amount’ etc.
iii) new_merchant_transactions, the features are same as historical_transactions. However it contains the transactions at new merchants (merchant_ids that this particular card_id has not yet visited) over a period of two months.
iv) merchant, it contains aggregate information for each merchant_id represented in the data set

Now we will analyze each features in more details. The purpose to perform the EDA is , this will help us to check if there is any outliers, blank or empty values and if any duplicate record exists as they need to be corrected to increase the performance. EDA also helps us to understand the available features deeply.

5.1.1. Plot feature_1:

Categorical feature, categories are 1,2,3,4,5

If we check the plot, clearly in feature_1 , category 3 is dominating.

5.1.2. Plot feature_2:

There are total three categories — 1,2,3

5.1.3. Plot feature_3:

Categorical feature — 0 & 1

5.1.4. Plot target feature:

This is the loyalty scores for each card_id, provided in real numbers. we will check the pdf (probability distribution function)and box-plot to analyze it better.

And the o/p is shown as

from the box-plot and pdf , we can conclude the target feature presents in between -32 to 18 and most of the data lies in between -10 to +10.

We will check the percentile data to understand better.

here is the o/p,

let’s go more deep,

if we check for the minimum value , it seems around 1% of data have very less loyalty score , less than -30. There is a chance , it could be outliers.

Now, we will analyse each features against the target feature

box plot of feature_1 , feature_2, feature_3 with target feature

Unfortunately none of the plots are clear as overlapping occurred in all the above cases , so we can’t conclude anything just by using the categorical features (feature_1,feature_2,feature_3) as they are not worthy.

5.1.5. Plot first_active_month:

As we could see, most of the data are from 2014 to 2017. The numbers are quite high from Aug’2017 to Dec’2017. Post Dec’2017 we found a downward trend. One reason might be due to festive seasons or Christmas time large number of cards were used to perform the first transaction in between Sep to Dec.

Now, let’s analyse the transaction data

5.2.1. authorized_flag:

In historical_transaction few transactions are recorded which are unauthorized. As we calculated the percentage , nearly 8.64% of total transactions are unauthorized. Although for new_merchant_transaction all of the transactions are authorized.

The plot gives an intuitive idea.

5.2.2. Plot Installments:

Since it’s a numerical feature, let’s go for percentile data,

As it is clearly shown most of the data are from 0 &1 installments , but we have few records for which installment is quiet high . let’s go further.

We will check more closely for 0th and 100th percentile.

Now we have installments with -1 , this should be outlier as installments can’t be negative. Alternatively the installments of 999 is massive. There is a high chance that it is also an outlier.
Similar pattern also observes for new merchant transaction as well.

5.2.3. purchase_amount:

It is surprising that purchase_amount is negative . However, according to provided Data_Dictionary.xlsx, purchase_amount is normalized one. so, it can be negative.Also the 100th percentile value is significantly higher in comparison with others.

5.2.4. purchase_date:

We have the historical transaction data from Jan’2017 to Feb’2018. And the transactions kept on increasing till Dec’2017. We experienced the similar kind of pattern in train dataset as well. Due to festive season more number of transactions are recorded in between Sep’2017 to Dec’2017.

In transactions dataset we also have few anomymized categorical features such as ‘category_1’, ‘category_2’, ‘category_3’ and also have ID features such as ‘crad_id’ , ‘merchant_id’, ‘city_id’, ‘state_id’ etc.

Now we will discuss about the merchants dataset.

‘merchant_id’ is present in both transactions and merchant dataset. There are few others common features also present. But as we checked , the values are different for the same ‘merchant_id’. So either we need to choose the merchants dataset or transactions dataset for the common features. Here , we are considering the data present in transactions dataset are more reliable.

5.3.1. Plot numerical_1 and numerical_2:

As we plot them , they’re highly co-related with each other.

We also performed the bi-variate analysis with the target feature , but none of them are promising and we can’t conclude anything by using only the given features. Hence, we must perform feature engineering to create more features before model development.

In the next section , we will check if the existing problem can be solved using classification , so based on the details the model will predict whether the customer is loyal or not.

To implement this , we must perform StandardScaler on the target feature, then we can set it as 0 if the value is ≤0 , similarly we can set as 1 , if it is ≥0.

After performing this , now we have new target feature consists of 0 and 1 as shown below,

Now we will check few other plots against the new target feature , let’s consider it as target_std. Some of them are shown below

5.4.1.

As a final analysis we can conclude, the existing features are not sufficient to predict the loyalty score as overlapping occurs in all of the combinations.

So, we must go for feature engineering steps , where we will generate few additional features which can be fed to the Model.

Next. we will discuss about the feature engineering.

6. Feature Engineering:

At first , we need to convert the categorical feature and make it as numeric one using map() function. for e.g. ‘Y’,’N’ can be replaced with 0,1. Similarly , ‘A’ , ‘B’ , ‘C’, ‘D’ , ‘E’ to be replaced with 1,2,3,4,5.And also NULL or empty values needs to be replaced with mean,median, mode as applicable. Once the data preprocessing is done, we can perform FE over it.

In FE steps, we perform the aggregate function over it to generate more features, some of them are shown below

We will add some additional features as well.

7. Model Building:

Once Feature engineering is performed , we will develop the model for final prediction.

As we have already discussed, we are going to solve the problem in both ways, i) classification and ii) regression

For Classification,
We used LGBMClassifier and XGBClassifier with RandomizedSearchCV for hyper parameter tuning and got the ROC_AUC score as below on validation set.

i) LGBM Classifier:

ii) XGB Classifier:

Unfortunately the ROC_AUC Score and accuracy are not so high so we should opt out from solving the problem as classification and we should go for Regression one.

For Regression,
Our best model is LGBMRegressor() with StratifiedKFold. We have also tried with XGBRegressor and MLP Architecture. Optuna has been used for hyper parameter tuning.

Hyperparameter tuning using Optuna:

After submission , we got the kaggle score as

LGBM Regressor with Stratifier kfold:

After submitting in Kaggle , we got

And kaggle rank is shown as below which is <27%

Since , we have already selected our best model so now our final step is to deploy our work in cloud so that it can be accessible through the Internet

8. Deployment:

The app is deployed in Heroku using Flask web framework. It is accessible from here, please use Chrome for better performances.

For deployment purpose we have created app.py where all our python code is present and for backend processing, index.html to take the input from the user . And to deploy it on Heroku we also need Procfile , runtime.txt and requirements.txt to define all the required python packages.

This is the glimpse for our deployed web app.