Apriori algorithm is unsupervised learning algorithm used for finding frequent itemsets in a given data set. It is a simple and powerful data mining algorithm that generates association rule on transaction databases. Given sets of products in a supermarket, with apriori algorithm we can come up with the pair of products that the customer often buys through association rules analysis. Apriori algorithm has many applications in data mining such as market basket analysis, auto-complete applications like google search and recommender systems. In this post we will focus on the apriori learning algorithm and look how it works, its weaknesses and strengths.

Apriori Algorithm

Apriori algorithm was developed by Agrawal and Srikant in 1994. It performs association rule analysis on transaction data sets. A transaction is viewed as a set of items and the algorithm strives to finding the relationships between items. A simple example of how apriori works is in the customer purchase behavior. If out of 100 customers who buys milk 70 of them also buys bread, the apriori algorithm generates the association rules on this trend. This is commonly seen in supermarkets. Whereas the apriori uses the transaction data sets, other algorithms such Winepi works on non-transaction data sets.

How Apriori Algorithm Works

The association-rule-mining uses the Candidate-Generation-Based (CGB) and the Patern-Growth-Based (PGB) strategies. CGB is a breadth-first search strategy where k+1 itemsets are generated based on the frequency of then items. Apriori algorithm uses the breadth-first search technique and a Hash tree structure to count the itemsets.

The association can be measured using three approaches;

  1. Support. This is the measure of association based on the proportion of the transactions of items (how many times an item occures). Let’s assume that we have the following products and their transactions as shown in the table below.
Transactions Items
T1 Milk,Bread,Apple
T2 Milk,Bread
T3 Milk,Bread,Apple, Banana
T4 Milk, Banana,Rice,Chicken
T5 Apple,Rice,Chicken
T6 Milk,Bread, Banana,
T7 Rice,Chicken
T8 Bread,Apple, Chicken

T9

Bread,Chicken
T10 Apple, Banana

 

The support of {Milk} will be as follows;

Support{Milk}

=5/10

Equivalent to 50%.

The same support follows for multiple items. Let’s see the support of {Milk,Bread}.

Support{Milk,Bread}

=3/10

  1. Confidence. This measures the likelihood of one item relative to another item represented as {X->Y}. Example is how likely will Y appear (bought) when X has appeared (been bought). From our table above let’s find the confidence of {Milk->Bread}.

Confidence{Milk->Bread}= Support{Milk,Bread}/Support{Milk}

= (3/10)/(5/10)

This is equivalent to the 60%

  1. Lift. This measures the likelihood of an item Y occurring after X has occurred, at the same time controlling the popularity of item Y. Let’s calculate the lift of {Milk->Bread}.

Lift{Milk->Bread} = Support{Milk,Bread}/(Support{Milk}*Support{Bread})

= 3/10((5/10)*(6/10))

Let’s see all the support of all items in our data set.

Transactions Support
Bread 6
Chicken 5
Milk 5
Apple 5
Banana 4
Rice 2

 

Applying the apriori in a supermarket context for our data in the first table can generate the following insights:

  • Items that are bought together can be placed close to each other.
  • Advertisement of one product such as Bread can be directed to buyers who buy Milk.
  • Discounting can be done on one product among the most frequently bought sets.
  • The supermarket can start selling products such as ice-cream made from milk and banana.

These are just simple interpretation of association analysis on consumer behavior from our data. However, there are other factors that come into play when making such decisions and can greatly affect consumer behavior.

Apriori Algorithm Example

Install apriori package using the following command in your Anaconda prompt

The apriori() method has several parameters such as min_support , min_confidence, min_lift, min_length  that can be turned to achieve desired results.

Association Analysis With Apriori

Visualizing Apriori For Two Pair Itemsets With Network Diagram

Network Diagram - Apriori Algorithm

Apriori Support With Bar Chart

Bar Chart - Apriori Algorithm

WordCloud Generator

For a detailed view of the frequency of items we can use a wordcloud.  To work with wordcloud in Python we need to install the package. Open your Anaconda prompt and run the following command.

You can also use the pip command as follow.

Items WordCloud

Output

Wordcloud - Apriori Algorithm

Pros

  • Easy to implement.
  • Works on large itemsets.

Cons

  • It is computationally expensive if we have large number of candidate rules.
  • The algorithm has to scan the entire database which can be costly.

Applications of Apriori Algorithm

  • Market basket analysis.
  • Auto-complete applications.
  • Detecting drug reactions in on patients.
  • Recommender systems.

Conclusion

Apriori algorithm is a supervised learning algorithm that is widely used in data mining. Apriori works by generating associations rules between itemsets. It is widely used in market basket analysis and understanding the customer buying behavior. Apriori algorithm is easy to implement but also it’s computationally expensive.

What’s Next

In this post we have looked at the apriori algorithm. In the next series of post we will focus on Artificial Neural Network and Deep Learning.

Apriori Algorithm

Post navigation