How To Leverage Market Basket Analysis For Revenue Growth

Market Basket Analysis For Revenue Growth


The world of e-commerce is fiercely competitive. For businesses to thrive, they need to continually innovate, providing added value to their customers in unique ways. One such innovative strategy is the creation of product stacks – grouping together complementary products that offer higher value when purchased together than separately. This blog post focuses on how we used Python to perform a market basket analysis, enabling us to create compelling product stacks for our client, a popular online supplements store.

Why Product Stacks?

Creating product stacks is a powerful e-commerce strategy for several reasons:

Higher Value Perception: 

Customers perceive a higher value when complementary products are grouped together, increasing their willingness to purchase.


Offering product stacks eliminates the hassle for customers to search and decide on compatible products, making the buying process smoother.

Increased Sales: 

It increases the average order value and can boost revenue as customers are encouraged to buy more products in a single purchase.

The Challenge

The main challenge was to identify complementary supplement products that not only make sense together from a health and fitness perspective but also provide a compelling price offer. With thousands of transactions and numerous products, manual identification was not feasible. 

The solution? 

Market Basket Analysis powered by Python.

What is Market Basket Analysis?

Market Basket Analysis is a data analysis technique used by retailers to uncover associations between items. It works by looking for combinations of items that occur together frequently in transactions, aiding in product placement, marketing, and even in the development of new product stacks – the exact use case we had.

How Did We Leverage Python?

Let’s delve into our Python code:


import pandas as pd
from mlxtend.preprocessing import TransactionEncoder
from mlxtend.frequent_patterns import apriori
from mlxtend.frequent_patterns import association_rules

pd.set_option('display.max_rows', None)
pd.set_option('display.max_columns', None)

df = pd.read_csv('/Users/stamenpetrov/Desktop/sales_data.csv')

baskets = df.groupby('Transaction ID')['Product Name'].apply(list)

te = TransactionEncoder()

te_array =

te_df = pd.DataFrame(te_array, columns=te.columns_)

frequent_itemsets = apriori(te_df, min_support=0.0008, use_colnames=True)

rules = association_rules(frequent_itemsets, metric="lift", min_threshold=0.18)


This code does the following:

  1. Libraries: Imported pandas and mlxtend modules for the Apriori algorithm.
  2. Data Loading: Data sourced from a CSV file.
  3. Preprocessing: Transactions grouped by ID and Product.
  4. Transformation: Used TransactionEncoder for one-hot encoding suitable for Apriori.
  5. Apriori: Applied with a defined minimum support.
  6. Rules Generation: Rules determined with a set ‘lift’ threshold.
  7. Output: Displayed item associations with support, confidence, and lift metrics.

The Outcome

Now we have a table of association rules along with metrics that indicate the strength and relevance of each rule. The three main metrics used in this analysis are support, confidence, and lift. Let’s dive deeper into understanding what these metrics mean:


This metric gives us the proportion of transactions that include the itemset of interest. In simpler terms, if we are looking at the rule {Product A} -> {Product B}, the support metric will tell us the percentage of total transactions that included both Product A and Product B. Higher support indicates that the rule under consideration is more frequent.


Confidence gives us the conditional probability of occurrence of the consequent given the antecedent. If we’re considering the rule {Product A} -> {Product B}, the confidence metric tells us the likelihood that Product B is also bought when Product A is bought. A higher confidence value indicates a greater likelihood of the itemset B being purchased when itemset A is purchased.


Lift measures the increase in the ratio of the sale of B when A is sold. Lift of 1 indicates no association between products. A lift of greater than 1 indicates that products are likely to be bought together, while a lift of less than 1 indicates that products are unlikely to be bought together.

Interpreting The Data

By interpreting these metrics, we were able to form combinations of 2 products each that were not just frequently bought together (high support), but also where the purchase of one product would increase the likelihood of the purchase of the other (high confidence), and that the combination was more likely to be bought together than separately (high lift).

We focused on creating combinations that made sense, where the products were complementary and would add value when used together. 

Creating The Stacks

After creating these stacks, we reached out to our client to verify the practicality of these combinations. 

The feedback was positive, and this led us to proceed to the next step of creating names, thorough descriptions, and images for each stack.

The Creation of The New Category

We didn’t stop at just the stacks. We strategically introduced a “Stacks” category on the website and ensured that it was prominently visible and easy to navigate for all visitors. Moreover, we recommended key elements to include in the category description to provide clarity and emphasize the unique value proposition of the product stacks. 


Harnessing the power of Python for our market basket analysis yielded impressive results. The approach led to a significant increase in both consumer engagement and revenue. Within a precise one-month post-launch window, the revenue share from these stacks rose dramatically by an astounding 422.89%, and there was a marked 16.95% boost in the average order value (AOV).

Want to hear from some of our top digital marketers or directly from our CEO?