Principal component analysis (PCA) is a statistical measure of how each explanatory variable is correlated to one another. PCA is unsupervised learning algorithm that is used for identifying patterns in the data. It shows the direction in which the data is dispersed (Eigenvectors) and the magnitude (Eigenvalues). Principal component analysis is used in dimensionality reduction for feature extraction. PCA transforms a complex high dimensional data through a linear combinations of features in the original data and creates a new set of features called principal components that are orthogonal (uncorrelated) while preserving the variance between the features. With PCA we can analyze and easily visualize complex data. In this post we are going to look at what PCA is, how it works, its limitations and strengths.

**Principal Component Analysis**

Principal component analysis is an orthogonal linear transformation that transforms the data to new dimension according to the variance of the features. The projection with the highest variance comes first, then followed by the least variance and so on. Using principal component analysis we can be able to transform a more complex high dimensional data into a simple to interpret low dimensional data by creating principal components. Principal component analysis was developed by Karl Pearson in 1901. PCA has been widely used in many fields and it is usually renamed according to that field e.g in eigenvalue decomposition (EVD) of XTX in linear algebra, empirical orthogonal functions (EOF) in meteorological science, empirical modal analysis in structural dynamics among others. In machine learning PCA is used in fearture extraction where it creates new dimension of the data with only important features and leave the (“bad”) unimportant features in the original data set. This is referred to as dimensionality reduction and in this way data is not lost.

**Principal Component Analysis Terminologies**

Before we look at how PCA works let’s define some of the fundamental terminologies that are commonly used in PCA.

Its a rectangular array of objects.**Matrix:**This measures how data is spread.**Variance:**This shows the direction to which the variables tend to move.*Covariance:*This is a vector whose direction remains unchanged after linear transformation.*EigenVector:*It is a number which when subtracted from a matrix and multiplied by the identity matrix gives a zero determinant. It’s also referred to as characteristic roots..*EigenValue:*This is the number of features in the data set.*Dimensionality:*

– Orthogonal: This implies lack of correlation between variables.

**How Principal Component Analysis Works**

Principal component analysis creates a new set of features from the old set of features. The new set of features has the following properties;

- The new set of features have zero correlation between the features.
- The new set of features are linear combination of the old features.
- The axes of these new features are called the principal components.
- The first principal component has the largest variance followed by the second principal component and so on.
- The principal components are orthogonal.
- The variance in principal components decreases from the first principal component to the last one.

When creating new set of features from the old features we compute the variance of the features and select the features that yield the highest variance. This forms the first principal component. This process continues as we go to the second, third and more principal components. There are various approaches that we can use to come up with principal components which includes maximizing variance and maximizing the reconstruction error. Below is a summary of the approaches of finding the principal components;

- Maximizing the variance
- Maximizing the reconstruction error
- Eigen-decomposition.
- Singular Value Decomposition.

Below are simple steps on implementing the PCA.

*– Data standardization.*

*– Computing the covariance matrix.*

*– Computing the Eigenvectors and Eigenvectors for the covariance matrix.*

*– Selecting the components starting with largest to smallest Eigenvalues.*

*– Creating the principal components.*

**When To Use Principal Component Analysis**

- When reducing the dimensionality of data.
- For feature Extraction.
- When determining the linear combinations of variables.
- When understanding the structure of the data set.
- When visualizing high dimensional data.

Below is a diagram showing three principal components. Note the distribution of data where data with highest variance forms the first principal component, followed by second then the third.

**Principal Component Analysis Example in Scikit-Learn**

There are different ways to achieving the principal component analysis, One of the way to do this is by manually calculating the PCA from the matrix by computing the covariance of the matrix, getting the egienvectors and eigenvalues of the covariance matrix and finaly creating the principal components. However, we also have many software tools and libraries that can make PCA easy to use. In this post we are going to leverage the ** scikit-learn** library. Scikit-learn comes with a

**class which is used for the analyses of principal components in the data.**

*sklearn.decomposition.PCA(n_components=None, copy=True, whiten=False, svd_solver=’auto’, tol=0.0, iterated_power=’auto’, random_state=None)***Principal Components**

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
from sklearn.decomposition import PCA import numpy as np rng = np.random.RandomState(5) X = np.dot(rng.rand(10, 10), rng.randn(10, 200)).T pca = PCA(n_components=4) pca.fit(X) X_pca = pca.transform(xx) X_new = pca.inverse_transform(X_pca) print("Components \n",pca.components_) print("\n\nExplained Variance ",pca.explained_variance_) print("original shape: ", xx.shape) print("Transform:", X_pca.shape) print("Inverse Transform ",X_new.shape) |

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
Output Components [[-0.31213838 -0.25696512 -0.2538881 -0.30749468 -0.41229917 -0.29388507 -0.39163385 -0.27649243 -0.28097608 -0.33525316] [-0.06484715 -0.26535727 0.3315649 -0.20846796 0.04632287 0.53435448 -0.52066736 -0.10066855 0.45057179 -0.00787845] [ 0.33597609 -0.31319697 -0.37472219 -0.29416319 0.33028526 -0.18527206 0.13075773 -0.50428391 0.23307695 0.304861 ] [-0.20049882 -0.38186971 0.09223148 0.70035028 0.03646774 -0.41048616 -0.0940914 -0.08206542 0.35771153 -0.04005417]] Explained Variance [ 24.44961769 2.01831723 1.65967551 1.25125414] original shape: (200, 10) Transform: (200, 4) Inverse Transform (200, 10) |

**Explained Variance**

1 2 3 4 5 6 7 8 9 10 11 12 |
from sklearn.decomposition import PCA import numpy as np rng = np.random.RandomState(4) X = np.dot(rng.rand(10, 10), rng.randn(10, 200)).T pca = PCA(n_components=4) pca.fit(X) print("\n\nExplained Variance ",pca.explained_variance_) print("\n\nExplained Variance Ratio ", pca.explained_variance_ratio_) print("\n\nCummulative Sum ", np.cumsum(pca.explained_variance_)) |

1 2 3 4 5 6 7 8 9 |
Output Explained Variance [ 25.05148803 2.01335341 1.33676588 1.14353196] Explained Variance Ratio [ 0.8079557 0.06493428 0.04311311 0.03688097] Cummulative Sum [ 25.05148803 27.06484144 28.40160732 29.54513928] |

**Explained Variance Plot**

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 |
from sklearn.decomposition import PCA import numpy as np import matplotlib.pyplot as plt plt.style.use('ggplot') plt.figure(figsize=(14,7)) rng = np.random.RandomState(5) X = np.dot(rng.rand(10, 10), rng.randn(10, 200)).T pca = PCA(n_components=4) pca.fit(X) explained_variance=pca.explained_variance_ explained_variance_ratio=pca.explained_variance_ratio_ print("\n\nExplained Variance ",explained_variance) print("\n\nExplained Variance Ratio ", explained_variance_ratio) print("\n\nCummulative Sum ", np.cumsum(pca.explained_variance_)) plt.subplot(1, 2, 1) plt.bar(range(4), explained_variance, alpha=0.5, align='center', label='Explained variance') plt.ylabel('Explained variance') plt.xlabel('Principal components') plt.legend(loc='best') plt.title("Explained Variance ", fontsize=20) plt.step(range(4), np.cumsum(pca.explained_variance_), where='mid',label='cumulative explained variance') plt.subplot(1, 2, 2) plt.bar(range(4), explained_variance_ratio, alpha=0.5, align='center', label='Explained variance ratio') plt.ylabel('Explained variance ratio') plt.xlabel('Principal components') plt.legend(loc='best') plt.title("Explained Variance Ratio", fontsize=20) plt.show() |

1 2 3 4 5 6 7 8 9 |
Output Explained Variance [ 24.44961769 2.01831723 1.65967551 1.25125414] Explained Variance Ratio [ 0.75997744 0.06273618 0.05158837 0.03889324] Cummulative Sum [ 24.44961769 26.46793493 28.12761044 29.37886458] |

**Principal Components Plot**

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 |
from sklearn.decomposition import PCA from sklearn.preprocessing import StandardScaler import numpy as np import matplotlib.pyplot as plt plt.style.use('ggplot') plt.figure(figsize=(14,7)) rng = np.random.RandomState(5) X = np.dot(rng.rand(10, 10), rng.randn(10, 200)).T X_scaled= StandardScaler().fit_transform(X) plt.subplot(1, 2, 1) plt.scatter(X_scaled[:, 0], X_scaled[:, 1], marker='*', s=50) plt.title("Before Principal Components (PC)", fontsize=20) pca = PCA(n_components=4) pca.fit(X_scaled) X_pca = pca.transform(X_scaled) plt.subplot(1, 2, 2) plt.scatter(X_pca[:, 0], X_pca[:, 1], marker='*', s=50) plt.title("After Principal Components (PC)", fontsize=20) plt.show() X_new = pca.inverse_transform(X_pca) plt.scatter(X_new[:, 0], X_new[:, 1], marker='*', s=50) plt.title("After Inverse Transform (PC)", fontsize=20) plt.show() |

**Output**

**PCA With Iris Data Set**

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 |
import pandas as pd from sklearn.preprocessing import StandardScaler from sklearn.decomposition import PCA import matplotlib.pyplot as plt plt.style.use('ggplot') plt.figure(figsize=(6, 5)) df = pd.read_csv(filepath_or_buffer='https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data',header=None, sep=',') df.columns=['sepal length', 'sepal width', 'petal length', 'petal width', 'species'] df.dropna(how="all", inplace=True) # drops the empty line at file-end X = df.ix[:,0:4].values y = df.ix[:,4].values X_scaled = StandardScaler().fit_transform(X) pca = PCA(n_components=2) Y_fitted = pca.fit_transform(X_scaled) for labels, columns in zip(('Iris-setosa', 'Iris-versicolor', 'Iris-virginica'),('red', 'green', 'blue')): plt.scatter(Y_fitted[y==labels, 0],Y_fitted[y==labels, 1],label=labels,c=columns,marker='*', s=50) plt.xlabel('Principal Component 1') plt.ylabel('Principal Component 2') plt.legend(loc='best') plt.title("PCA On Iris Data", fontsize=20) plt.show() |

**Output**

**Pros**

- Easy and simple to implement.
- Easy to visualize complex data.
- We only focus on most useful features of the data.
- Reduces the size of the data.

**Cons**

- It is sensitive to outliers.
- It is not very efficient in some high dimensional data set compared to other methods such as singular value decomposition (SVD).

**Applications Of Principal Component Analysis**

Principal component analysis has vast number of applications in different domains. It is mostly used in finding hidden patterns in data and reducing the dimensionality of data. Below are few domains where the PCA is very useful.

- Data mining.
- Image processing.
- Financial analysis.
- Statistical quality control.
- Computer vision.
- Stock market prediction.

**Conclusion**

Principal component analysis is unsupervised learning algorithm that is used in reducing the dimension of the data set and finding hidden pattern from the data. Before starting data modeling process PCA should be the first exploratory data analysis done on data to understand it. PCA is a linear transformation technique that transforms a high dimensional data to lower dimensional data that is orthogonal. PCA has many applications in different domains, in machine learning it is commonly used in feature extraction. It is also very useful in data mining tasks. PCA is not the only method for dimensionality reduction, other methods include singular value decomposition which we will cover in the next post.

**What’s Next**

In this post we have looked at the Principal component analysis, in the next post we will look at the singular value decomposition (SVD).