Acknowledgements : Myself Kartik Gaglani and my team member Amar Patel created this blog as part of the course work under "Master In Data Science Programme" at Suven Consultants , under mentor-ship of Rocky Jagtiani.
Market Basket Analysis
Ever wondered why items are displayed in a particular way in retail/online stores. Why certain items are suggested to you based on what you have added to the cart? Blame it on market basket analysis or association rule mining.
Market basket analysis uses association rule mining under the hood to identify products frequently bought together. Before we get into the nitty gritty of market basket analysis, let us get a basic understanding of association rule mining. It finds association between different objects in a set. In the case of market basket analysis, the objects are the products purchased by a customer and the set is the transaction. In short, market basket analysis
- is a unsupervised data mining technique
- that uncovers products frequently bought together
- and creates if-then scenario rules
- designing store layout
- online recommendation engines
- targeted marketing campaign/sales promotion/email campaign
Before stepping in inside my analysis, let understand all concepts of Association Rules.
Association Rule-Mining behind it.
![]() |
Why Association Rule Mining?
Association Rule Mining is used when we want to find an association between various objects in a set, find frequent patterns in a transaction database, relational databases, or any other information repository. It gives us what items do customers frequently buy together by generating a set of rules called Association Rules.
Why Market Basket?
- To change the store layout based on Association Rules
- To change the design of Catalog
- How to cross-market on online stores
- Which are the trending items customers buy
- Customized emails with add-on sales
Also, to get a clear understanding of the applications of MBA, make sure you checkout Example of Walmart’s Beer-Diaper Parable.
Importing the required packages:
![]() |
dataset |

Visualization of the top 20 “Hot” items:
Contribution of Top 20 “Hot” items to Total Sales:

Pruning the dataset for Frequently Bought Items:

Association Rule Mining with FP Growth:
Based on Minimum Support:
The support of an itemset X, supp(X) is the proportion of transaction in the database in which the item X appears. It signifies the popularity of an itemset.
supp(X)=Number of transaction in which X appears / Total number of transactions.
If the sales of a particular product (item) above a certain proportion have a meaningful effect on profits, that proportion can be considered as the support threshold. Furthermore, we can identify itemsets that have support values beyond this threshold as significant itemsets.
Based on Confidence:
Confidence of a rule is defined as follows:
conf(X⟶Y)=supp(X∪ Y) / supp(X)
It signifies the likelihood of item Y being purchased when item X is purchased. So, for the rule {Onion, Potato} => {Burger}
It can also be interpreted as the conditional probability P(Y|X), i.e, the probability of finding the itemset Y in transactions given the transaction already contains X.
It can give some important insights, but it also has a major drawback. It only takes into account the popularity of the itemset X and not the popularity of Y. If Y is equally popular as X then there will be a higher probability that a transaction containing X will also contain Y thus increasing the confidence. To overcome this drawback there is another measure called lift.
Lift:
The lift of a rule is defined as:
lift(X⟶Y)=supp(X∪Y) / ( supp(X)∗ supp(Y) )
This signifies the likelihood of the itemset Y being purchased when item X is purchased while taking into account the popularity of Y.
If the value of lift is greater than 1, it means that the itemset Y is likely to be bought with itemset X, while a value less than 1 implies that itemset Y is unlikely to be bought if the itemset X is bought.
Example: A => B [support = 5%, confidence = 80%]
5% of all the transactions show that A and B have been purchased together. 80% of the customers that bought A also bought B.
Common algorithm of Apriori is used in this case
Here, we have collected rules having maximum lift for each of the items that can be a consequent (that appears on the right side).
Support of the rule is 0.00305, which means all the items together appear in 305 transactions in the dataset.
Confidence of the rule is 29.07%, which means that 29.07% of the time the antecedent items occurred, we also had the consequent in the transaction (i.e., 29.07% of times, customers who bought the left side items also bought root vegetables).
No comments:
Post a Comment