Machine Learning: Teaching a machine to learn

In my previous post on recommendation engines, I fleetingly mentioned about machine learning. Talking about machine learning, what comes to my mind is a recent conversation I had with my uncle. He was asking me on what I was working on and I started mentioning about machine learning and data science . He listened very attentively and later on told my mother that he had absolutely no clue  what I was talking about. So  I thought it would be a good idea to try and unravel the science behind machine learning.

Let me start with an analogy. When we were all toddlers whenever we saw something new ,say a dog, our  parents would point and tell us “Look , a dog”. This is how we start to learn about things around us,from inputs such as these that we receive from our parents and elders . The science behind  machine learning works pretty similar. In this context, the toddler is the machine and the elder which teaches the machine is a bunch of  data .

In very simple terms the setup for a machine learning context works  like this. The machine is fed with a set of data. This data consists of two parts, one part is called  features and the other labels. Let me elaborate a little bit more. Suppose we are training the machine to identify the image of a dog. As a first step we feed multiple images of dogs to the machine. Each image which is fed, say a jpeg or png image, consists of millions of pixels. Each pixel in turn is composed of some value of the three primary colors Red, Blue and Green. The values of these primary colors ranges between 0 to 256. This is called the pixel intensity. For example the pixel intensity for the color orange would be (255,102,0), where 255 is the intensity of its red component, 102 its green component and 0 its blue component. Like wise, every pixel in an image will have various combinations of these primary colors.


These pixel intensities are the features of the image which are provided as inputs to the machine. Against each of these features, we also provide a class or category describing the features we provided. This is the label. This data set is our basic input. To  visualize the data set, think of it as a huge table of pixel values and its labels. If we have,say 10 pixels per image and there are 10 images. Our table will have 10 rows, corresponding to each image and for each row there would be 11 columns. The first 10 columns would correspond to  pixel values and the 11th column would be the label.

Now that we have provided the machine its data, let us look at how it learns. For this let me take you back to your school days. In your basic geometry, you would have learnt the equation of a line as Y = C + (theta * X). In this equation, the variable C is called the intercept and theta the slope of the line. These two variables govern the  properties of the line Y . The relevance of these variables is that, if we are given any other value of X, then by our knowledge of C and theta we will be able to predict or create a line. So by learning  two parameters we are in effect predicting an outcome. This  is the essence of machine learning. In a machine learning, setup the machine is made to learn the parameters from the features which is provided.Equipped with the knowledge of these parameters the machine will be able to predict the most probable values of Y(Outcomes) when new values of X(features) are provided.

In our dog identification example, the X values are the pixel intensities of the images we provided, Y denotes labels of the dogs. The parameters are learned from the provided data. If we are to give the machine new values of X’s which contain say  features of both dogs and cats, the machine will correctly identify which is a dog and which is a cat, with its knowledge of the parameters. The first set of data which we provide to the machine for it to learn parameters is called the training set and the new data which we provide for prediction is called the test set. The above mentioned genre of machine learning is called Supervised Learning. Needless to say, the earlier equation of the line is one among multiple types of algorithm used in machine learning. This type of algorithm for the line is called linear regression. There are multiple algorithms like these which enables machines to learn parameters and carry out predictions.

What I have described herewith is a very simple version of machine learning. Advances are being made in this field and scientists are trying to mimic the learning mechanism of human brain on to machines. An important and growing field aligned to this idea is called Deep Learning. I will delve on deep learning in a future post.

The power of machine learning is quite prevalent in the world around us and quite often the learning  is inconspicuous. As a matter of fact, we are all party to the training process inconspicuously. A very popular example is the photo tagging process in Facebook. When we tag pictures which we post on Facebook, we are in fact providing labels enabling a machine to learn.  Facebook’s powerful machines will extract features from the photos we tag. Next time we tag a new photo, Facebook will automatically predict the correct tag through the parameters which it has learned. So next time you tag a picture on Facebook, realize that you are also playing your part in teaching a machine to learn.


The Recommendation Engine

I was recently browsing through Amazon and guess what ? All that was displayed to me were a bunch of books, books which probably I might never buy at all. I wasn’t quite surprised about the choices Amazon laid out to me. One reason for this is, I am a very dormant online buyer. So the choices Amazon laid out to me is a reflection of the fact that,it doesn’t know me well at all. But wait a minute, did I just say, that Amazon doesn’t know me ?? How can a website know me ? Knowing , understanding , taking care are all traits supposed to be associated with living entities, and not with a static webpages. If you are also thinking the same way, then you are in for a huge surprise. Static webpages are part of old dispensation, the new mantra is making everything,  from webpages to billboards and every facets which touch customers, teeming with life. All these are made possible through advances made in field of machine learning. Yes, machines are equipped with sufficient intelligence to learn based on their interaction with customers . So that they also start taking care of you and me. This is the new dispensation. In this post, I would like to unravel one such application in the field of machine learning, which lies at the heart of online stores like Amazon , E bay etc. : The Recommendation Engine :

You as an avid online buyer would have noticed that before logging in to any of these online stores, if you just browse these sites, you will be shown a bunch of items scrolling before you. Now these could be items which are totally unrelated to your tastes. However Amazon or any online store decided to recommend it to you because these are their top selling or trending items. Bereft of any intelligence about you as a buyer this is the best, the website could lay out to you. This kind of recommendation is called the Non Personalized recommendations. Such recommendations are made based on the top items which are being bought or searched on the site.

Now once you log in, it would be a totally different world. Based on  your level of activity on the site, you would realize that many of the products which are recommended to you are more aligned to your tastes. The more your level of activity, more aligned to your tastes the products recommended to you. This is the part which I referred to you in the beginning about the site understanding you. The more it understands you, the better it would take care of you. Interesting isn’t it ? These type of recommendations falls under the genre called the personalized recommendations.

Personalized recommendations predominantly works on an algorithm called the collaborative filtering. A very simple analogy of the collaborative filtering algorithm is a huge table, where the rows of the table will be users like you and me and the columns of the table will be the items which you or me has bought or has shown interest in. So this table is one huge table with millions and millions of items and as many customers in it. Each time you buy something or even browse something, against your name against the corresponding item column,some value will be updated . However one interesting point to note is that, you as an individual customer at the most would not have bought more than hundreds of types of items. This is quite minuscule compared to the millions of items which adorn the columns of the huge table. This is the case for most of the other users too. The number of items which any user would have shown interest  would be quite minuscule  in comparison to the total number of items in the table. This kind of representation is called the sparse representation.

So naturally you would think, if you as a customer buys or shows interests in only a small percentage of items, how come Amazon recommends new things to me. That’s where the intricacies of the collaborative filtering algorithm kicks in. As I said earlier, the table is a large table with millions of users. So considering the millions of users and the varied tastes each user will have, there would be some transactions which would have happened against all the items in the table. The essence of the collaborative filtering algorithm is to find similarities from this huge table. Similarities between users who would be have bought similar kinds of items, similarities between items which are usually bought together etc. It is these similarities extracted from that huge table, which forms the basis of the recommendations. So the idea is like this, if you and me like casual dressing, we would be more inclined to browse for such brands. So based on our transactions, the algorithm will combine both of us as people having similar tastes. Now next time you go ahead and buy a new Polo shirt, the algorithm will assume that I might also like such a shirt and will recommend the same kind of shirt to me too. This is how the collaborative filtering algorithm works. In addition to the similarities between users, the algorithm also finds similarities between items too, to further enhance the ability to recommend products.

In addition to the above, there is another type of recommendation.  Say you want to buy an ice bucket and you start browsing for various models of ice buckets. Once you zero in on the model you like and decide to add it to the cart, you might get a recommendation for an Ice Scoop saying – “Items usually bought together”.  This is an example of similarities between items and is called Market Basket Analysis. The idea behind this algorithm is also similar to the one mentioned above. In this type of algorithm, the huge table is again analysed and transactions where two or more similar items are bought together are identified and is often recommended when one of them is being bought.

Now the base of all these data products is the transactions you do on the virtual world. All the websites you browse, things you rate, items you buy, something which you comment on , all these generates data which would be channelized to make you buy more. And all these happens without you realizing whats going on.  So next time, you browse the net and suddenly you find an ad for a new Polo shirt,do not be surprised. “Somebody is Watching”

Watchin you