Hierarchical agglomerative clustering is a hierarchical clustering algorithm that uses the bottom up approach when creating data clusters. When building clusters the agglomerative algorithm merges (agglomerates) the single most objects that are close to each other then goes to the next level where it merges together clusters that are similar to each other. It operates this way until we have one cluster is formed. Hierarchical agglomerative clustering algorithm is one of the mostly used hierarchical clustering and at some point it’s preferred to K-Means because it produces more accurate results in some tasks. In this post we are going to look at the hierarchical agglomerative clustering how it works, its pros, cons and applications.

## Hierarchical Clustering

As the name suggests hierarchical clustering groups data in a hierarchical structure in a parent child relationship. The hierarchical clustering creates a tree like structure called the dendrogram. We have two types of hierarchical clustering algorithms; the agglomerative and the divisie clustering algorithms. The hierarchical agglomerative clustering algorithm uses a bottom-up technique whereas the divisive clustering algorithm uses the top-down approach. The bellow diagram shows the agglomerative and divisive clustering algorithms;

Agglomerative Clustering Divisive Clustering ## Hierarchical Agglomerative Clustering

The agglomerative algorithm uses the bottom-up approach in clustering data. Each observation in a data set is treated as a single cluster. Below are steps used by the agglomerative clustering;

1. Assign each individual observation to its own cluster.
2. Combine clusters that are similar by measuring the dissimilarity metric using the distance function e.g Euclidean distance.
3. Progressively merge the clusters using the similarity distance between the clusters. Repeat this step until all the clusters are merged into one single cluster. In combining the cluster we use the distance function to determine the distance between each cluster. We have three methods used in measuring the distance between each cluster; The methods are single linkage, complete linkage and average linkage. Let’s look at each linkage criteria briefly;

1. Single Linkage. This method uses the shortest distance between two points in each cluster. This is shown in the example below. 1. Complete Linkage. This method uses two points from each cluster that are further apart. This forms the longest distance. 1. Average Linkage. This method uses the average distance between each point in each cluster. After the dendrogram has been formed you can navigate through the tree and decide which layer has the most suitable solution to your problem.

## Hierarchical agglomerative clustering algorithm Example in Scikit-Learn

Ths scikit-learn sklearn.cluster class comes with AgglomerativeClustering(n_clusters=2, affinity=’euclidean’, memory=None, connectivity=None, compute_full_tree=’auto’, linkage=’ward’, pooling_func=<function mean>) method that is used for clustering data set in hierarchy. To read more about each parameter that can be turned to achieve maximum performance refer to scikit-learn documentation here.

Generating Random Data Set With make_blobs

Random data set – Hierarchical Agglomerative Clustering

Output Creating Hierarchical Clusters

Output Creating a Dendrogram

Output Creating Iris Data Set Dendrogram

Output Iris Data Set Original Classification

Output Hierarchical Agglomerative Clustering On Iris Data Set

Output ## Pros

• Has fewer assumptions about the distribution of data unlike k-means.
• It is robust as it gives different clusters at each layer on the dendrogram.

## Cons

• It has a quadratic time complexity making it slower than k-means.
• Can’t handle large data well due to quadratic time complexity.

## Applications of Hierarchical Agglomerative Clustering Algorithm

• Market segmentation.
• Image processing.
• Recommendation systems.
• Search engine results analysis.

## Conclusion

Hierarchical clustering offers an alternative approach to clustering aside from k-means clustering. Unlike k-means clustering where the algorithm defines the k number of clusters then computes the mean of each centroid, in hierarchical clustering the data is incrementally added to the tree like dendrogram in the case of agglomerative clustering or the data is split (divided) as is the case with the divisive clustering. Hierarchical agglomerative clustering is widely used than the divisive clustering algorithms. The limitation of the agglomerative clustering is that it is expensive as it has a quadratic time complexity. However, the agglomerative clustering is more robust as we can be able to get many layers of the tree representing different set of results from the data set.

## What’s Next

In this post we have looked at the hierarchical agglomerative clustering algorithm, in the next post we are going to look at the principal component analysis which is used for dimensionality reduction tasks.

Hierarchical Agglomerative Clustering