Building Self Learning Recommendation system – VII : Productionizing the application : II

This is the seventh post of our series on building a self learning recommendation system using reinforcement learning. This series consists of 8 posts where in we progressively build a self learning recommendation system.

Recommendation system and reinforcement learning primer
Introduction to multi armed bandit problem
Self learning recommendation system as a K-armed bandit
Build the prototype of the self learning recommendation system : Part I
Build the prototype of the self learning recommendation system : Part II
Productionising the self learning recommendation system : Part I – Customer Segmentation
Productionising the self learning recommendation system: Part II – Implementing self learning recommendation ( This Post )
Evaluating different deployment options for the self learning recommendation systems.

This post builds on the previous post where we started off with productionizing the application using python scripts. In the last post we completed the customer segmentation part. In this post we continue from where we left off and then build the self learning system using python scripts. Let us get going.

Creation of States

Let us take a quick recap of the project structure and what we covered in the last post.

In the last post we were in the early part of our main driver file rlRecoMain.py. We explored rfmMaker class in file rfmProcess.py from the processes directory. We will now explore selfLearnProcess.py file in the same directory.

Open a new file and name it selfLearnProcess.py and insert the following code

import pandas as pd
from numpy.random import normal as GaussianDistribution
from collections import OrderedDict
from collections import Counter
import operator
from random import sample
import numpy as np
from pymongo import MongoClient
client = MongoClient(port=27017)
db = client.rlRecomendation



class rlLearn:
    def __init__(self,custDetails,conf):
        # Get the date  as a seperate column
        custDetails['Date'] = custDetails['Parse_date'].apply(lambda x: x.strftime("%d"))
        # Converting date to float for easy comparison
        custDetails['Date'] = custDetails['Date'].astype('float64')
        # Get the period of month column
        custDetails['monthPeriod'] = custDetails['Date'].apply(lambda x: int(x > conf['monthPer']))
        # Aggregate the custDetails to get a distribution of rewards
        rewardFull = custDetails.groupby(['Segment', 'Month', 'monthPeriod', 'Day', conf['product_id']])[conf['prod_qnty']].agg(
            'sum').reset_index()
        # Get these data frames for all methods
        self.custDetails = custDetails
        self.conf = conf
        self.rewardFull = rewardFull
        # Defining some dictionaries for storing the values
        self.countDic = {}  # Dictionary to store the count of products
        self.polDic = {}  # Dictionary to store the value distribution
        self.rewDic = {}  # Dictionary to store the reward distribution
        self.recoCountdic = {}  # Dictionary to store the recommendation counts

    # Method to find unique values of each of the variables
    def uniqeVars(self):
        # Finding unique value for each of the variables
        segments = list(self.rewardFull.Segment.unique())
        months = list(self.rewardFull.Month.unique())
        monthPeriod = list(self.rewardFull.monthPeriod.unique())
        days = list(self.rewardFull.Day.unique())
        return segments,months,monthPeriod,days

    # Method to consolidate all products
    def prodConsolidator(self):
        # Get all the unique values of the variables
        segments, months, monthPeriod, days = self.uniqeVars()
        # Creating the consolidated dictionary
        for seg in segments:
            for mon in months:
                for period in monthPeriod:
                    for day in days:
                        # Get the subset of the data
                        subset1 = self.rewardFull[(self.rewardFull['Segment'] == seg) & (self.rewardFull['Month'] == mon) & (
                                self.rewardFull['monthPeriod'] == period) & (self.rewardFull['Day'] == day)]
                        # INitializing a temporary dictionary to storing in mongodb
                        tempDic = {}
                        # Check if the subset is valid
                        if len(subset1) > 0:
                            # Iterate through each of the subset and get the products and its quantities
                            stateId = str(seg) + '_' + mon + '_' + str(period) + '_' + day
                            # Define a dictionary for the state ID
                            self.countDic[stateId] = {}
                            tempDic[stateId] = {}
                            for i in range(len(subset1.StockCode)):
                                # Store in the Count dictionary
                                self.countDic[stateId][subset1.iloc[i]['StockCode']] = int(subset1.iloc[i]['Quantity'])
                                tempDic[stateId][subset1.iloc[i]['StockCode']] = int(subset1.iloc[i]['Quantity'])
                            # Dumping each record into mongo db
                            db.rlQuantdic.insert(tempDic)

        # Consolidate the rewards and value functions based on the quantities
        for key in self.countDic.keys():
            # Creating two temporary dictionaries for loading in Mongodb
            tempDicpol = {}
            tempDicrew = {}
            # First get the dictionary of products for a state
            prodCounts = self.countDic[key]
            self.polDic[key] = {}
            self.rewDic[key] = {}
            # Initializing temporary dictionaries also
            tempDicpol[key] = {}
            tempDicrew[key] = {}
            # Update the policy values
            for pkey in prodCounts.keys():
                # Creating the value dictionary using a Gaussian process
                self.polDic[key][pkey] = GaussianDistribution(loc=prodCounts[pkey], scale=1, size=1)[0].round(2)
                tempDicpol[key][pkey] = self.polDic[key][pkey]
                # Creating a reward dictionary using a Gaussian process
                self.rewDic[key][pkey] = GaussianDistribution(loc=prodCounts[pkey], scale=1, size=1)[0].round(2)
                tempDicrew[key][pkey] = self.rewDic[key][pkey]
            # Dumping each of these in mongo db
            db.rlRewarddic.insert(tempDicrew)
            db.rlValuedic.insert(tempDicpol)
        print('[INFO] Dumped the quantity dictionary,policy and rewards in MongoDB')

As usual we start with import of the libraries we want from lines 1-7. In this implementation we make a small deviation from the prototype which we developed in the previous post. During the prototyping phase we predominantly relied on dictionaries to store data. However here we would be storing data in Mongo DB. Those of you who are not fully conversant with MongoDB can refer to some good tutorials on MongDB like the one here. I will also be explaining the key features as and when required. In line 8, we import the MongoClient which is required for connections with the data base. We then define the client using the default port number ( 27017 ) in line 9 and then name the data base where we will store the recommendation in line 10. The name of the database we have selected is rlRecomendation . You are free to choose any name of your choice.

Let us now explore the rlLearn class. The constructor of the class which starts from line 15, takes the custDetails data frame and the configuration file as inputs. You would already be familiar with lines 17-23 from our prototyping phase, where we extract information to create states and then consolidate the data frame to get the quantities of each state. In lines 30-33, we create dictionaries where we store the relevant information like count of products, value distribution, reward distribution and the number of times the products are recommended.

The main method within the rlLearn class is the prodConslidator() method in lines 45-95. We have seen the details of this method in the prototyping phase. Just to recap, in this method we iterate through each of the components of our states and then store the quantities of each product under the state in the dictionaries. However there is a subtle difference from what we did during the prototyping phase. Here we are inserting each state and its associated products in Mongodb data base we created, as shown in line 70, 93 and 94. We create a temporary dictionary in line 57 to dump each state into Mongodb. We also store the data in the dictionaries,as we did during the prototyping phase, so that we get the data for other methods in this class. The final outcome from this method, is the creation of the count dictionary, value dictionary and reward dictionary from our data and updation of this data in Mongodb.

This takes us to the end of the rlLearn class.

We now go back to the driver file rlRecoMain.py and the explore the next important class rlRecomend.

The rlRecomend class has the methods which are required for recommending products. This class has many methods and therefore we will go one by one through each of the methods. We have seen all these methods during the prototyping phase and therefore we will not get into detailed explanation of these methods here. For detailed explanation you can refer to the previous post.

Now on the selfLearnProcess.py start adding the code pertaining to the rlRecomend class.

class rlRecomend:
    def __init__(self, custDetails, conf):
        # Get the date  as a seperate column
        custDetails['Date'] = custDetails['Parse_date'].apply(lambda x: x.strftime("%d"))
        # Converting date to float for easy comparison
        custDetails['Date'] = custDetails['Date'].astype('float64')
        # Get the period of month column
        custDetails['monthPeriod'] = custDetails['Date'].apply(lambda x: int(x > conf['monthPer']))
        # Aggregate the custDetails to get a distribution of rewards
        rewardFull = custDetails.groupby(['Segment', 'Month', 'monthPeriod', 'Day', conf['product_id']])[
            conf['prod_qnty']].agg(
            'sum').reset_index()
        # Get these data frames for all methods
        self.custDetails = custDetails
        self.conf = conf
        self.rewardFull = rewardFull

The above code is for the constructor of the class ( lines 97 – 112 ), which is similar to the constructor of the rlLearn class. Here we consolidate the custDetails data frame and get the count of each products for the respective state.

Let us now look at the next two methods. Add the following code to the class we earlier created.

# Method to find unique values of each of the variables
    def uniqeVars(self):
        # Finding unique value for each of the variables
        segments = list(self.rewardFull.Segment.unique())
        months = list(self.rewardFull.Month.unique())
        monthPeriod = list(self.rewardFull.monthPeriod.unique())
        days = list(self.rewardFull.Day.unique())
        return segments, months, monthPeriod, days

    # Method to sample a state
    def stateSample(self):
        # Get the unique state elements
        segments, months, monthPeriod, days = self.uniqeVars()
        # Get the context of the customer. For the time being let us randomly select all the states
        seg = sample(segments, 1)[0]  # Sample the segment
        mon = sample(months, 1)[0]  # Sample the month
        monthPer = sample([0, 1], 1)[0]  # sample the month period
        day = sample(days, 1)[0]  # Sample the day
        # Get the state id by combining all these samples
        stateId = str(seg) + '_' + mon + '_' + str(monthPer) + '_' + day
        self.seg = seg
        return stateId

The first method , lines 115 – 121, is to get the unique values of segments, months, month-period and days. This information will be used in some of the methods we will see later on. The second method detailed in lines 124-135, is to sample a state id, through random sampling of the components of a state.

The next methods we will explore are to initialise dictionaries if a state id has not been seen earlier. The first method initialises dictionaries and the second method inserts a recommendation collection record in MongoDB if the state dosent exist. Let us see the code for these methods.

  # Method to initialize a dictionary in case a state Id is not available
    def collfinder(self,stateId,countDic,polDic,rewDic,recoCountdic):
        # Defining some dictionaries for storing the values
        self.countDic = countDic  # Dictionary to store the count of products
        self.polDic = polDic  # Dictionary to store the value distribution
        self.rewDic = rewDic  # Dictionary to store the reward distribution
        self.recoCountdic = recoCountdic  # Dictionary to store the recommendatio
        self.stateId = stateId
        print("[INFO] The current state is :", stateId)
        if self.countDic is None:
            print("[INFO] State ID do not exist")
            self.countDic = {}
            self.countDic[stateId] = {}
            self.polDic = {}
            self.polDic[stateId] = {}
            self.rewDic = {}
            self.rewDic[stateId] = {}
        if self.recoCountdic is None:
            self.recoCountdic = {}
            self.recoCountdic[stateId] = {}
        else:
            self.recoCountdic[stateId] = {}

# Method to update the recommendation dictionary
    def recoCollChecker(self):
        print("[INFO] Inside the recommendation collection")
        recoCol = db.rlRecotrack.find_one({self.stateId: {'$exists': True}})
        if recoCol is None:
            print("[INFO] Inserting the record in the recommendation collection")
            db.rlRecotrack.insert_one({self.stateId: {}})
        return recoCol

The inputs to the first method, as in line 138 are the state Id and all the other 4 dictionaries we extract from Mongo DB, which we will see later on in the main script rlRecoMain.py. If no record exists for a specific state Id, the dictionaries we extract from Mongo DB would be null and therefore we need to initialize these dictionaries for storing all the values of products, its values, rewards and the count of recommendations. The initialisation of these dictionaries are implemented in this method from lines 146-158.

The second initialisation method is to check for the recommendation count dictionary for a specific state Id. We first check for the state Id in the collection in line 163. If the record dosent exist then we insert a blank dictionary for that state in line 166.

Let us now look at the next two methods in the class

    # Create a function to get a list of products for a certain segment
    def segProduct(self,seg, nproducts):
        # Get the list of unique products for each segment
        seg_products = list(self.rewardFull[self.rewardFull['Segment'] == seg]['StockCode'].unique())
        seg_products = sample(seg_products, nproducts)
        return seg_products

    # This is the function to get the top n products based on value
    def sortlist(self,nproducts,seg):
        # Get the top products based on the values and sort them from product with largest value to least
        topProducts = sorted(self.polDic[self.stateId].keys(), key=lambda kv: self.polDic[self.stateId][kv])[-nproducts:][::-1]
        # If the topProducts is less than the required number of products nproducts, sample the delta
        while len(topProducts) < nproducts:
            print("[INFO] top products less than required number of products")
            segProducts = self.segProduct(seg, (nproducts - len(topProducts)))
            newList = topProducts + segProducts
            # Finding unique products
            topProducts = list(OrderedDict.fromkeys(newList))
        return topProducts

The method in lines 171-175 is to sample a list of products for a segment. This method is used incase the number of products in a particular state is less than the total number of products which we want to recommend. In such cases, we randomly sample some products from the list of all products bought by customers in that segment and then add it to the list of products we want to recommend. We will see this in action in sortlist method (lines 178-188).

The sortlist method, sorts the list of products based on the demand for that product and the returns the list of top products. The inputs to this method are the number of products we want to be recommended and the segment ( line 178 ). We then get the top ‘n‘ products by sorting the value dictionary based on the number of times a product is bought as in line 180. If the number of products is less than the required products, sampling of products is done using the segProduct method we saw earlier. The final list of top products is then returned by this method.

The next method which we are going to explore is the one which controls the exploration and exploitation process thereby generating a list of products to be recommended. Let us add the following code to the class.

# This is the function to create the number of products based on exploration and exploitation
    def sampProduct(self,seg, nproducts,epsilon):
        # Initialise an empty list for storing the recommended products
        seg_products = []
        # Get the list of unique products for each segment
        Segment_products = list(self.rewardFull[self.rewardFull['Segment'] == seg]['StockCode'].unique())
        # Get the list of top n products based on value
        topProducts = self.sortlist(nproducts,seg)
        # Start a loop to get the required number of products
        while len(seg_products) < nproducts:
            # First find a probability
            probability = np.random.rand()
            if probability >= epsilon:
                # print(topProducts)
                # The top product would be first product in the list
                prod = topProducts[0]
                # Append the selected product to the list
                seg_products.append(prod)
                # Remove the top product once appended
                topProducts.pop(0)
                # Ensure that seg_products is unique
                seg_products = list(OrderedDict.fromkeys(seg_products))
            else:
                # If the probability is less than epsilon value randomly sample one product
                prod = sample(Segment_products, 1)[0]
                seg_products.append(prod)
                # Ensure that seg_products is unique
                seg_products = list(OrderedDict.fromkeys(seg_products))
        return seg_products

The inputs to the method are the segment, number of products to be recommended and the epsilon value which determines exploration and exploitation as shown in line 191. In line 195, we get the list of the products for the segment. This list is from where products are sampled during the exploration phase. We also get the list of top products which needs to be recommended in line 197, using the sortlist method we defined earlier. In lines 199-218 we implement the exploitation and exploration processes we discussed during the prototyping phase and finally we return the list of top products for recommendation.

The next method which we will explore is the one to update dictionaries after the recommendation process.

# This is the method for updating the dictionaries after recommendation
    def dicUpdater(self,prodList, stateId):        
        for prod in prodList:
            # Check if the product is in the dictionary
            if prod in list(self.countDic[stateId].keys()):
                # Update the count by 1
                self.countDic[stateId][prod] += 1                
            else:
                self.countDic[stateId][prod] = 1                
            if prod in list(self.recoCountdic[stateId].keys()):
                # Update the recommended products with 1
                self.recoCountdic[stateId][prod] += 1                
            else:
                # Initialise the recommended products as 1
                self.recoCountdic[stateId][prod] = 1                
            if prod not in list(self.polDic[stateId].keys()):
                # Initialise the value as 0
                self.polDic[stateId][prod] = 0                
            if prod not in list(self.rewDic[stateId].keys()):
                # Initialise the reward dictionary as 0
                self.rewDic[stateId][prod] = GaussianDistribution(loc=0, scale=1, size=1)[0].round(2)                
        print("[INFO] Completed the initial dictionary updates")

The inputs to this method, as in line 221, are the list of products to be recommended and the state Id. From lines 222-234, we iterate through each of the recommended product and increament the count in the dictionary if the product exists in the dictionary or initialize the count to 1 if the product wasnt available. Later on in lines 235-240, we initialise the value dictionary and the reward dictionary if the products are not available in them.

The next method we will see is the one for initializing the dictionaries in case the context dosent exist.

    def dicAdder(self,prodList, stateId):        
        # Loop through the product list
        for prod in prodList:
            # Initialise the count as 1
            self.countDic[stateId][prod] = 1
            # Initialise the value as 0
            self.polDic[stateId][prod] = 0
            # Initialise the recommended products as 1
            self.recoCountdic[stateId][prod] = 1
            # Initialise the reward dictionary as 0
            self.rewDic[stateId][prod] = GaussianDistribution(loc=0, scale=1, size=1)[0].round(2)
        print("[INFO] Completed the dictionary initialization")
        # Next update the collections with the respective updates        
        # Updating the quantity collection
        db.rlQuantdic.insert_one({stateId: self.countDic[stateId]})
        # Updating the recommendation tracking collection
        db.rlRecotrack.insert_one({stateId: self.recoCount[stateId]})
        # Updating the value function collection for the products
        db.rlValuedic.insert_one({stateId: self.polDic[stateId]})
        # Updating the rewards collection
        db.rlRewarddic.insert_one({stateId: self.rewDic[stateId]})
        print('[INFO] Completed updating all the collections')

If the state Id dosent exist, the dictionaries are initialised as seen in lines 147-155. Once the dictionaries are initialised, MongoDb data bases are updated in lines 259-265.

The next method which we are going to explore is one of the main methods which integrates all the methods we have seen so far. This methods implements the recomendation process. Let us explore this method.

# Method to sample a stateID and then initialize the dictionaries
    def rlRecommender(self):
        # First sample a stateID
        stateId = self.stateId        
        # Start the recommendation process
        if len(self.polDic[stateId]) > 0:
            print("The context exists")
            # Implement the sampling of products based on exploration and exploitation
            seg_products = self.sampProduct(self.seg, self.conf["nProducts"],self.conf["epsilon"])
            # Check if the recommendation count collection exist
            recoCol = self.recoCollChecker()
            print('Recommendation collection existing :',recoCol)
            # Update the dictionaries of values and rewards
            self.dicUpdater(seg_products, stateId)
        else:
            print("The context dosent exist")
            # Get the list of relavant products
            seg_products = self.segProduct(self.seg, conf["nProducts"])
            # Add products to the value dictionary and rewards dictionary
            self.dicAdder(seg_products, stateId)
        print("[INFO] Completed the recommendation process")

        return seg_products

The first step in the process is to get the state Id ( line 271 ) based on which we have to do all the recommendations. Once we have the state Id, we check if it is an existing state id in line 273. If it is an existing state Id we get the list of ‘n’ products for recommendation using the sampProduct method we saw earlier, where we implement exploration and exploitation. Once we get the products we initialise the recommendation collection in line 278. Finally we update all dictionaries using the dicUpdater method in line 281.

From lines 282-287, we implement a similar process when the state Id dosent exist. The only difference in this case is in the initialisation of the dictionaries in line 287, where we use the dicAdder method.

Once we complete the recommendation process, we get into simulating the customer action.

# Function to initiate customer action
    def custAction(self,segproducts):
        print('[INFO] getting the customer action')
        # Sample a value to get how many products will be clicked
        click_number = np.random.choice(np.arange(0, 10),
                                        p=[0.50, 0.35, 0.10, 0.025, 0.015, 0.0055, 0.002, 0.00125, 0.00124, 0.00001])
        # Sample products which will be clicked based on click number
        click_list = sample(segproducts, click_number)

        # Sample for buy values
        buy_number = np.random.choice(np.arange(0, 10),
                                      p=[0.70, 0.15, 0.10, 0.025, 0.015, 0.0055, 0.002, 0.00125, 0.00124, 0.00001])
        # Sample products which will be bought based on buy number
        buy_list = sample(segproducts, buy_number)

        return click_list, buy_list

Lines 296-305 implements the processes for simulating the list of products which are bought and browsed by the customer based on the recommendation we made. The method returns the list of products which were browsed through and also the one which were bought. For detailed explanations on these methods please refer the previous post

The next methods we will explore are the ones related to the value updation of the recommendation system.

    def getReward(self,loc):
        rew = GaussianDistribution(loc=loc, scale=1, size=1)[0].round(2)
        return rew

    def saPolicy(self,rew, prod):
        # This function gets the relavant algorithm for the policy update
        # Get the current value of the state        
        vcur = self.polDic[self.stateId][prod]        
        # Get the counts of the current product
        n = self.recoCountdic[self.stateId][prod]        
        # Calculate the new value
        Incvcur = (1 / n) * (rew - vcur)       
        return Incvcur

The getReward method on line 309 is to generate a reward from a gaussian distribution centred around the reward value. We will see the use of this method in subsequent methods.

The saPolicy method in lines 313-321 updates the value of the state based on the simple averaging method in line 320. We have already seen these methods in our prototyping phase in the previous post.

Next we will see the method which uses both the above methods.

    def valueUpdater(self,seg_products, loc, custList, remove=True):
        for prod in custList:
            # Get the reward for the bought product. The reward will be centered around the defined reward for each action
            rew = self.getReward(loc)            
            # Update the reward in the reward dictionary
            self.rewDic[self.stateId][prod] += rew            
            # Update the policy based on the reward
            Incvcur = self.saPolicy(rew, prod)            
            self.polDic[self.stateId][prod] += Incvcur           
            # Remove the bought product from the product list
            if remove:
                seg_products.remove(prod)
        return seg_products

The inputs to this method are the recommended list of products, the mean reward ( click, buy or ignore), the corresponding list ( click list or buy list) and a flag to indicate if the product has to be removed from the recommendation list or not.

We interate through all the products in the customer action list in line 324 and then gets the reward in line 326. Once the reward is incremented in the reward dictionary in line 328, we get the incremental value in line 330 and this is updated in the value dictionary in line 331. If the flag is True, we remove the product from the recommended list and the finally returns the remaining recommendation list.

The next method is the last of the methods and ties the above three methods with the customer action.

# Function to update the reward dictionary and the value dictionary based on customer action
    def rewardUpdater(self, seg_products,custBuy=[], custClick=[]):
        # Check if there are any customer purchases
        if len(custBuy) > 0:
            seg_products = self.valueUpdater(seg_products, self.conf['buyReward'], custBuy)
            # Repeat the same process for customer click
        if len(custClick) > 0:
            seg_products = self.valueUpdater(seg_products, self.conf['clickReward'], custClick)
            # For those products not clicked or bought, give a penalty
        if len(seg_products) > 0:
            custList = seg_products.copy()
            seg_products = self.valueUpdater(seg_products, -2, custList,False)
        # Next update the collections with the respective updates
        print('[INFO] Updating all the collections')
        # Updating the quantity collection
        db.rlQuantdic.replace_one({self.stateId: {'$exists': True}}, {self.stateId: self.countDic[self.stateId]})
        # Updating the recommendation tracking collection
        db.rlRecotrack.replace_one({self.stateId: {'$exists': True}}, {self.stateId: self.recoCountdic[self.stateId]})
        # Updating the value function collection for the products
        db.rlValuedic.replace_one({self.stateId: {'$exists': True}}, {self.stateId: self.polDic[self.stateId]})
        # Updating the rewards collection
        db.rlRewarddic.replace_one({self.stateId: {'$exists': True}}, {self.stateId: self.rewDic[self.stateId]})
        print('[INFO] Completed updating all the collections')

In lines 340-348, we update the value based on the number of products bought, clicked and ignored. Once the value dictionaries are updated, the respective MongoDb dictionaries are updated in lines 352-358.

With this we have covered all the methods which are required for implementing the self learning recommendation system. Let us summarise our learning so far in this post.

Created the states and updated MongoDB with the states data. We used the historic data for initialisation of values.
Implemented the recommendation process by getting a list of products to be recommended to the customer
Explored customer response simulation wherein the customer response to the recommended products were implemented.
Updated the value functions and reward functions after customer response
Updated Mongo DB collections after the completion of the process for a customer.

What next ?

We are coming to the fag end of our series. The next post is where we tie all these methods together in the main driver file and see how these processes are implmented. We will also run the script on the terminal and observe the results. Once the application implementation is done, we will also explore avenues to deploy the application. Watch this space for the last post of the series.

Please subscribe to this blog post to get notifications when the next post is published.

You can also subscribe to our Youtube channel for all the videos related to this series.

The complete code base for the series is in the Bayesian Quest Git hub repository

Do you want to Climb the Machine Learning Knowledge Pyramid ?

Knowledge acquisition is such a liberating experience. The more you invest in your knowledge enhacement, the more empowered you become. The best way to acquire knowledge is by practical application or learn by doing. If you are inspired by the prospect of being empowerd by practical knowledge in Machine learning, subscribe to our Youtube channel

I would also recommend two books I have co-authored. The first one is specialised in d eep learning with practical hands on exercises and interactive video and audio aids for learning

This book is accessible using the following links

The Deep Learning Workshop on Amazon

The Deep Learning Workshop on Packt

The second book equips you with practical machine learning skill sets. The pedagogy is through practical interactive exercises and activities.

This book can be accessed using the following links

The Data Science Workshop on Amazon

The Data Science Workshop on Packt

Enjoy your learning experience and be empowered !!!!

Building Self Learning Recommendation system – VI : Productionizing the application : I

This is the sixth post of our series on building a self learning recommendation system using reinforcement learning. This series consists of 8 posts where in we progressively build a self learning recommendation system. This series consists of the following posts

Recommendation system and reinforcement learning primer
Introduction to multi armed bandit problem
Self learning recommendation system as a K-armed bandit
Build the prototype of the self learning recommendation system : Part I
Build the prototype of the self learning recommendation system : Part II
Productionising the self learning recommendation system: Part I – Customer Segmentation ( This post )
Productionising the self learning recommendation system: Part II – Implementing self learning recommendation
Evaluating different deployment options for the self learning recommendation systems.

This post builds on the previous post where we started off with building the prototype of the application in Jupyter notebooks. In this post we will see how to convert our prototype into Python scripts. Converting into python script is important because that is the basis for building an application and then deploying them for general consumption.

File Structure for the project

First let us look at the file structure of our project.

The directory RL_Recomendations is the main directory which contains other folders which are required for the project. Out of the directories rlreco is a virtual environment we will create and all our working directories are within this virtual environment.Along with the folders we also have the script rlRecoMain.py which is the main driver script for the application. We will now go through some of the steps in creating this folder structure

When building an application it is always a good practice to create a virtual environment and then complete the application build process within the virtual environment. We talked about this in one of our earlier series for building machine translation applications . This way we can ensure that only application specific libraries and packages are present when we deploy our application.

Let us first create a separate folder in our drive and then create a virtual environment within that folder. In a Linux based system, a seperate folder can be created as follows

$ mkdir RL_Recomendations

Once the new directory is created let us change directory into the RL_Recomendations directory and then create a virtual environment. A virtual environment can be created on Linux with Python3 with the below script

RL_Recomendations $ python3 -m venv rlreco

Here the rlreco is the name of our virtual environment. The virtual environment which we created can be activated as below

RL_Recomendations $ source rlreco/bin/activate

Once the virtual environment is enabled we will get the following prompt.

(rlreco) ~$

In addition you will notice that a new folder created with the same name as the virtual environment. We will use that folder to create all our folders and main files required for our application. Let us traverse through our driver file and then create all the folders and files required for our application.

Create the driver file

Open a file using your favourite editor and name it rlRecoMain.py and the insert the following code.

import argparse
import pandas as pd
from utils import Conf,helperFunctions
from Data import DataProcessor
from processes import rfmMaker,rlLearn,rlRecomend
from utils import helperFunctions
import os.path
from pymongo import MongoClient

Lines 1-2 we import the libraries which we require for our application. In line 3 we have to import Conf class from the utils folder.

So first let us create a folder called utils, which will have the following file structure.

The utils folder has a file called Conf.py which contains the Conf class and another file called helperFunctions.py . The first file controls the configuration functions and the second file contains some of the helper functions like saving data into pickle files. We will get to that in a moment.

First let us open a new python file Conf.py and copy the following code.

from json_minify import json_minify
import json

class Conf:

    def __init__(self,confPath):
        # Read the json file and load it into a dictionary
        conf = json.loads(json_minify(open(confPath).read()))
        self.__dict__.update(conf)
    def __getitem__(self, k):
        return self.__dict__.get(k,None)

The Conf class is a simple class, with its constructor loading the configuration file which is in json format in line 8. Once the configuration file is loaded the elements are extracted by invoking ‘conf’ method. We will see more of how this is used later.

We have talked about the Conf class which loads the configuration file, however we havent made the configuration file yet. As you may know a configuration file contains all the parameters in the application. Let us see the directory structure of the configuration file.

Figure : config folder and configuration file

You can now create the folder called config, under the rlreco folder and then open a file in your editor and then name it custprof.json and include the following code.

{

  /****
  * paths required
  ****/

  "inputData" : "/media/acer/7DC832E057A5BDB1/JMJTL/Tomslabs/Datasets/Retail/OnlineRetail.csv",
  "custDetails" : "/media/acer/7DC832E057A5BDB1/JMJTL/Tomslabs/BayesianQuest/RL_Recomendations/rlreco/output/custDetails.pkl",

  /****
  * Column mapping
  ****/

  "order_id" : "InvoiceNo",
  "product_id": "StockCode",
  "product" : "Description",
  "prod_qnty" : "Quantity",
  "order_date" : "InvoiceDate",
  "unit_price" : "UnitPrice",
  "customer_id" : "CustomerID",
    /****
  * Parameters
  ****/

  "nclust" : 4,
  "monthPer" : 15,
  "epsilon" : 0.1,
  "nProducts" : 10,
  "buyReward" : 5,
  "clickReward": 1
}

As you can see the config, file contains all the configuration items required as part of the application. The first part is where the paths to the raw file and processed pickle files are stored. The second part is the mapping of the column names in the raw file and the names used in our application. The third part contains all the parameters required for the application. The Conf class which we earlier saw will read the json file and all these parameters will be loaded to memory for us to be used in the application.

Lets come back to the utils folder and create the second file which we will name as helperFunctions.py and insert the following code.

from pickle import load
from pickle import dump
import numpy as np


# Function to Save data to pickle form
def save_clean_data(data,filename):
    dump(data,open(filename,'wb'))
    print('Saved: %s' % filename)

# Function to load pickle data from disk
def load_files(filename):
    return load(open(filename,'rb'))

This file contains two functions. The first function starting in line 7 saves a file in pickle format to the specified path. The second function in line 12, loads a pickle file and return the data. These two functions are handy functions which will be used later in our project.

We will come back to the main file rlRecoMain.py and look at the next folder and methods on line 4. In this line we import DataProcessor method from the folder Data . Let us take a look at the folder called Data.

Create the data processor module

The class and the methods associated with the class are in the file dataLoader.py. Let us first create the folder, Data and then open a file named dataLoader.py and insert the following code.

import os
import pandas as pd
import pickle
import numpy as np
import random
from utils import helperFunctions
from datetime import datetime, timedelta,date
from dateutil.parser import parse

class DataProcessor:
    def __init__(self,configfile):
        # This is the first method in the DataProcessor class
        self.config = configfile

     # This is the method to load data from the input files
    def dataLoader(self):
        inputPath = self.config["inputData"]
        dataFrame = pd.read_csv(inputPath,encoding = "ISO-8859-1")
        return dataFrame

    # This is the method for parsing dates
    def dateParser(self):
        custDetails = self.dataLoader()
        #Parsing  the date
        custDetails['Parse_date'] = custDetails[self.config["order_date"]].apply(lambda x: parse(x))
        # Parsing the weekdaty
        custDetails['Weekday'] = custDetails['Parse_date'].apply(lambda x: x.weekday())
        # Parsing the Day
        custDetails['Day'] = custDetails['Parse_date'].apply(lambda x: x.strftime("%A"))
        # Parsing the Month
        custDetails['Month'] = custDetails['Parse_date'].apply(lambda x: x.strftime("%B"))
        # Getting the year
        custDetails['Year'] = custDetails['Parse_date'].apply(lambda x: x.strftime("%Y"))
        # Getting year and month together as one feature
        custDetails['year_month'] = custDetails['Year'] + "_" +custDetails['Month']

        return custDetails

    def gvCreator(self):
        custDetails = self.dateParser()
        # Creating gross value column
        custDetails['grossValue'] = custDetails[self.config["prod_qnty"]] * custDetails[self.config["unit_price"]]

        return custDetails

The constructor of the DataProcessor class takes the config file as the input and then make it available for all the other methods in line 13.

This dataProcessor class will have three methods, dataLoader, dateParser and gvCreator. The last method is the driving method which internally calls other two methods. Let us look at the gvCreator method.

The dateParser method is called first within the gvCreator method in line 40. The dateParser method in turn calls the dataLoader method in line 23. The dataLoader method loads the customer data as a pandas data frame in line 18 and the passes it to the dateParser method in line 23. The dateParser method takes the custDetails data frame and then extracts all the date related fields from lines 25-35. We saw this in detail during the prototyping phase in the previous post.

Once the dates are parsed in the custDetails data frame, it is passed to gvCreator method in line 40 and then the ‘gross value’ is calcuated by multiplying the unit price and the product quantity. Finally the processed custDetails file is returned.

Now we will come back to the rlRecoMain file and the look at the three other classes, rfmMaker,rlLearn,rlRecomend, we import in line 5 of the file rlRecoMain.py. This is imported from the ‘processes’ folder. Let us look at the composition of the processes folder.

We have three files in the folder, processes.

The first one is the __init__.py file which is the constructor to the package. Let us see its contentes. Open a file and name it __init__.py and add the following lines of code.

from .rfmProcess import rfmMaker
from .selfLearnProcess import rlLearn,rlRecomend

Create customer segmentation modules

In lines 1-2 of the constructor file we make the three classes ( rfmMaker,rlLearn and rlRecomend) available to the package. The class rfmMaker is in the file rfmProcess.py and the other two classes are in the file selfLearnProcess.py.

Let us open a new file, name it rfmProcess.py and then insert the following code.

import sys
sys.path.append('path_to_the_folder/RL_Recomendations/rlreco')
import pandas as pd
import lifetimes
from sklearn.cluster import KMeans
from utils import helperFunctions



class rfmMaker:
    def __init__(self,custDetails,conf):
        self.custDetails = custDetails
        self.conf = conf

    def rfmMatrix(self):
        # Converting data to RFM format
        RfmAgeTrain = lifetimes.utils.summary_data_from_transaction_data(self.custDetails, self.conf['customer_id'], 'Parse_date','grossValue')
        # Reset the index
        RfmAgeTrain = RfmAgeTrain.reset_index()
        return RfmAgeTrain

    # Function for ordering cluster numbers

    def order_cluster(self,cluster_field_name, target_field_name, data, ascending):
        # Group the data on the clusters and summarise the target field(recency/frequency/monetary) based on the mean value
        data_new = data.groupby(cluster_field_name)[target_field_name].mean().reset_index()
        # Sort the data based on the values of the target field
        data_new = data_new.sort_values(by=target_field_name, ascending=ascending).reset_index(drop=True)
        # Create a new column called index for storing the sorted index values
        data_new['index'] = data_new.index
        # Merge the summarised data onto the original data set so that the index is mapped to the cluster
        data_final = pd.merge(data, data_new[[cluster_field_name, 'index']], on=cluster_field_name)
        # From the final data drop the cluster name as the index is the new cluster
        data_final = data_final.drop([cluster_field_name], axis=1)
        # Rename the index column to cluster name
        data_final = data_final.rename(columns={'index': cluster_field_name})
        return data_final

    # Function to do the cluster ordering for each cluster
    #

    def clusterSorter(self,target_field_name,RfmAgeTrain, ascending):
        # Defining the number of clusters
        nclust = self.conf['nclust']
        # Make the subset data frame using the required feature
        user_variable = RfmAgeTrain[['CustomerID', target_field_name]]
        # let us take four clusters indicating 4 quadrants
        kmeans = KMeans(n_clusters=nclust)
        kmeans.fit(user_variable[[target_field_name]])
        # Create the cluster field name from the target field name
        cluster_field_name = target_field_name + 'Cluster'
        # Create the clusters
        user_variable[cluster_field_name] = kmeans.predict(user_variable[[target_field_name]])
        # Sort and reset index
        user_variable.sort_values(by=target_field_name, ascending=ascending).reset_index(drop=True)
        # Sort the data frame according to cluster values
        user_variable = self.order_cluster(cluster_field_name, target_field_name, user_variable, ascending)
        return user_variable


    def clusterCreator(self):
        
        #data : THis is the dataframe for which we want to create the clsuters
        #clustName : This is the variable name
        #nclust ; Numvber of clusters to be created
        
        # Get the RFM data Frame
        RfmAgeTrain = self.rfmMatrix()
        # Implementing for user recency
        user_recency = self.clusterSorter('recency', RfmAgeTrain,False)
        #print('recency grouping',user_recency.groupby('recencyCluster')['recency'].mean().reset_index())
        # Implementing for user frequency
        user_freqency = self.clusterSorter('frequency', RfmAgeTrain, True)
        #print('frequency grouping',user_freqency.groupby('frequencyCluster')['frequency'].mean().reset_index())
        # Implementing for monetary values
        user_monetary = self.clusterSorter('monetary_value', RfmAgeTrain, True)
        #print('monetary grouping',user_monetary.groupby('monetary_valueCluster')['monetary_value'].mean().reset_index())

        # Merging the individual data frames with the main data frame
        RfmAgeTrain = pd.merge(RfmAgeTrain, user_monetary[["CustomerID", 'monetary_valueCluster']], on='CustomerID')
        RfmAgeTrain = pd.merge(RfmAgeTrain, user_freqency[["CustomerID", 'frequencyCluster']], on='CustomerID')
        RfmAgeTrain = pd.merge(RfmAgeTrain, user_recency[["CustomerID", 'recencyCluster']], on='CustomerID')
        # Calculate the overall score
        RfmAgeTrain['OverallScore'] = RfmAgeTrain['recencyCluster'] + RfmAgeTrain['frequencyCluster'] + RfmAgeTrain['monetary_valueCluster']
        return RfmAgeTrain

    def segmenter(self):
        
        #This is the script to create segments after the RFM analysis
        
        # Get the RFM data Frame
        RfmAgeTrain = self.clusterCreator()
        # Segment data
        RfmAgeTrain['Segment'] = 'Q1'
        RfmAgeTrain.loc[(RfmAgeTrain.OverallScore == 0), 'Segment'] = 'Q2'
        RfmAgeTrain.loc[(RfmAgeTrain.OverallScore == 1), 'Segment'] = 'Q2'
        RfmAgeTrain.loc[(RfmAgeTrain.OverallScore == 2), 'Segment'] = 'Q3'
        RfmAgeTrain.loc[(RfmAgeTrain.OverallScore == 4), 'Segment'] = 'Q4'
        RfmAgeTrain.loc[(RfmAgeTrain.OverallScore == 5), 'Segment'] = 'Q4'
        RfmAgeTrain.loc[(RfmAgeTrain.OverallScore == 6), 'Segment'] = 'Q4'

        # Merging the customer details with the segment
        custDetails = pd.merge(self.custDetails, RfmAgeTrain, on=['CustomerID'], how='left')
        # Saving the details as a pickle file
        helperFunctions.save_clean_data(custDetails,self.conf["custDetails"])
        print("[INFO] Saved customer details ")

        return custDetails

The rfmMaker, class contains methods which does the following tasks,Converting the custDetails data frame to the RFM format. We saw this method in the previous post, where we used the lifetimes library to convert the data frame to the RFM format. This process is detailed in the rfmMatrix method from lines 15-20.

Once the data is made in the RFM format, the next task as we saw in the previous post was to create the clusters for recency, frequency and monetary values. During our prototyping phase we decided to adopt 4 clusters for each of these variables. In this method we will pass the number of clusters through the configuration file as seen in line 44 and then we create these clusters using Kmeans method as shown in lines 48-49. Once the clusters are created, the clusters are sorted to get a logical order. We saw these steps during the prototyping phase and these are implemented using clusterCreator method ( lines 61-85) clusterSorter method ( lines 42-58 ) and orderCluster methods ( lines 24 – 37 ). As the name suggests the first method is to create the cluster and the latter two are to sort it in the logical way. The detailed explanations of these functions are detailed in the last post.

After the clusters are made and sorted, the next task was to merge it with the original data frame. This is done in the latter part of the clusterCreator method ( lines 80-82 ). As we saw in the prototyping phase we merged all the three cluster details to the original data frame and then created the overall score by summing up the scores of each of the individual clusters ( line 84 ) . Finally this data frame is returned to the final method segmenter for defining the segments

Our final task was to combine the clusters to 4 distinct segments as seen from the protoyping phase. We do these steps in the segmenter method ( lines 94-100 ). After these steps we have 4 segments ‘Q1’ to ‘Q4’ and these segments are merged to the custDetails data frame ( line 103 ).

Thats takes us to the end of this post. So let us summarise all our learning so far in this post.

Created the folder structure for the project
Created a virtual environment and activated the virtual environment
Created folders like Config, Data, Processes, Utils and the created the corresponding files
Created the code and files for data loading, data clustering and segmenting using the RFM process

We will not get into other aspects of building our self learning system in the next post.

What Next ?

Now that we have explored rfmMaker class in file rfmProcess.py in the next post we will define the classes and methods for implementing the recommendation and self learning processes. The next post will be published next week. To be notified of the next post please subscribe to this blog post .You can also subscribe to our Youtube channel for all the videos related to this series.