Acknowledgements : Myself Kartik Gaglani and my team member Amar Patel created this blog as part of the course work under "Master In Data Science Programme" at Suven Consultants , under mentor-ship of Rocky Jagtiani.

Market Basket Analysis

Ever wondered why items are displayed in a particular way in retail/online stores. Why certain items are suggested to you based on what you have added to the cart? Blame it on market basket analysis or association rule mining.

Market basket analysis uses association rule mining under the hood to identify products frequently bought together. Before we get into the nitty gritty of market basket analysis, let us get a basic understanding of association rule mining. It finds association between different objects in a set. In the case of market basket analysis, the objects are the products purchased by a customer and the set is the transaction. In short, market basket analysis

is a unsupervised data mining technique
that uncovers products frequently bought together
and creates if-then scenario rules

Market basket analysis creates actionable insights for:

designing store layout
online recommendation engines
targeted marketing campaign/sales promotion/email campaign

I did my internship on"Market Basket Analysis using Apriori algorithm" from suven consultants under the guidance of Rocky Jagtiani.

Before stepping in inside my analysis, let understand all concepts of Association Rules.

Association Rule-Mining behind it.

Have you ever entered a store to buy a dozen eggs, and left the store with bread, milk, yogurt… and more?! So have I. Most of us give in to the urge of impulsive shopping, and this is precisely how supermarkets (any physical shopping stores) make more and more profits. Apriori Algorithm is behind the Sneaky Psychology of Supermarkets.

I recently completed my Data Analytics Internship at Suven Consultants and Technology Pvt. Ltd. and performed Market Basket Analysis on Groceries dataset.

Why Association Rule Mining?

Association Rule Mining is used when we want to find an association between various objects in a set, find frequent patterns in a transaction database, relational databases, or any other information repository. It gives us what items do customers frequently buy together by generating a set of rules called Association Rules.

Why Market Basket?

To change the store layout based on Association Rules
To change the design of Catalog
How to cross-market on online stores
Which are the trending items customers buy
Customized emails with add-on sales

Also, to get a clear understanding of the applications of MBA, make sure you checkout Example of Walmart’s Beer-Diaper Parable.

Importing the required packages:

Let’s take a look at our dataset:

dataset

Which are the Top 20 “Hot” items:

Visualization of the top 20 “Hot” items:

Contribution of Top 20 “Hot” items to Total Sales:

This shows us that the top five items are responsible for 21.4% of the entire sales and only the top 20 items are responsible for over 50% of the sales! This is important for us, as we don’t want to find association rules for items which are bought very infrequently. With this information we can limit the items we want to explore for creating our association rules. This also helps us in keeping our possible item set number to a manageable figure.

Pruning the dataset for Frequently Bought Items:

Association Rule Mining with FP Growth:

Based on Minimum Support:

The support of an itemset X, supp(X) is the proportion of transaction in the database in which the item X appears. It signifies the popularity of an itemset.

supp(X)=Number of transaction in which X appears / Total number of transactions.

If the sales of a particular product (item) above a certain proportion have a meaningful effect on profits, that proportion can be considered as the support threshold. Furthermore, we can identify itemsets that have support values beyond this threshold as significant itemsets.

Based on Confidence:

Confidence of a rule is defined as follows:

conf(X⟶Y)=supp(X∪ Y) / supp(X)
It signifies the likelihood of item Y being purchased when item X is purchased. So, for the rule {Onion, Potato} => {Burger}
It can also be interpreted as the conditional probability P(Y|X), i.e, the probability of finding the itemset Y in transactions given the transaction already contains X.
It can give some important insights, but it also has a major drawback. It only takes into account the popularity of the itemset X and not the popularity of Y. If Y is equally popular as X then there will be a higher probability that a transaction containing X will also contain Y thus increasing the confidence. To overcome this drawback there is another measure called lift.

Lift:

The lift of a rule is defined as:
lift(X⟶Y)=supp(X∪Y) / ( supp(X)∗ supp(Y) )
This signifies the likelihood of the itemset Y being purchased when item X is purchased while taking into account the popularity of Y.
If the value of lift is greater than 1, it means that the itemset Y is likely to be bought with itemset X, while a value less than 1 implies that itemset Y is unlikely to be bought if the itemset X is bought.

"Lift(X=>Y) = Confidence(X, Y)/Support(Y)"

Example: A => B [support = 5%, confidence = 80%]

5% of all the transactions show that A and B have been purchased together. 80% of the customers that bought A also bought B.

Common algorithm of Apriori is used in this case

Note: Association and Recommendation are different because association is not about a particular individual’s preference but about relations between sets of items.

Sorting the Association Rules:

final output

Here, we have collected rules having maximum lift for each of the items that can be a consequent (that appears on the right side).

Support of the rule is 0.00305, which means all the items together appear in 305 transactions in the dataset.

Confidence of the rule is 29.07%, which means that 29.07% of the time the antecedent items occurred, we also had the consequent in the transaction (i.e., 29.07% of times, customers who bought the left side items also bought root vegetables).

Another essential metric is Lift. Lift means that the probability of finding tropical fruit in the transactions which have yogurt, whole milk, and sausage is higher than the reasonable likelihood of finding root vegetables in the previous transactions. Typically, a lift value of 1 indicates that the probability of occurrence of the antecedent and consequent together are independent of each other. Hence, the idea is to look for rules having a lift much higher than 1.

This is a significant piece of information, as this can prompt a retailer to bundle specific products like these together or run a marketing scheme that offers discount on buying root vegetables along with these other three products.

Vote of Thanks:

I would like to humbly and sincerely thank my mentor Rocky Jagtiani. He is more of a friend to me then mentor. The Machine Learning course taught by him and various assignments we did and are still doing is the best way to learn and skill in Data Science field.

Recommended: Suven Consultants.

Market Basket Analysis and Association Mining(using apriori algorithm)

Market Basket Analysis(apriori algorithm)