Data visualization is an analytical technique for communicating data insights in a graphical way. With the increase in the amount of data over the past few years most organization place there value in data. Data visualization allows the data analyst and data scientist to present there research findings in a simple, clear and easy to understand to all audiences. Nowadays, data is everywhere, we are surrounded with data, however, data is as good as it can be presented to the right audience. In this post we are going to learn about machine learning data visualization with Matplotlib and Seaborn before we begin modeling and predicting. For more details on data visualization you can visit my series on data visualization with Matplotlib.

Machine Learning Data Visualization

Before you start creating your machine learning model, you need to first clearly understand your problem with the data you have. Data analysis and data visualization is very important in discovering hidden insights in the data. Some insights can be easy to spot with only using data analysis and data visualization techniques without going to the machine learning and more advanced techniques. This in turns saves time and effort for developing models to discover complex insights. There are different tools and products for data visualization available both open source and some at a price. The purchased tools include Tableau, IBM Cognos and Excel while the open source tools include Apache Superset, Datawrapper, Google Data Studio,Google Chart, Plotly, and most importantly Matplotlib. In this post we are going to focus on machine learning data visualization with Matplotlib. Matplotlib is an open source Python data visualization library. Matplotlib comes with readily installed with the Anaconda platform. Seaborn is a Python library for interactive data visualization that works well with Matplotlib.

Project Set Up

In this post we are going to use the iris data set which can be downloaded hereĀ  iris data set (127 downloads) and boston house price data set which can be downloaded boston-house-price-dataset (132 downloads) . Download these data sets and place them in your working directory. We are only going to visualize the data before we apply machine learning model. We will use Matplotlib and Seaborn. Seaborn is an open source Python data visualization library that is based on Matplotlib. Install seaborn open Anaconda Prompt then enter this command :

Now that we have installed seaborn library let’s start.

Scatter plot

Output

scatter plot - machine learning data visualization

FacetGrid plot

Output

facetgrid plot - machine learning data visualization

FacetGrid line plot

Output

facetgrid line plot - machine learning data visualization

Pairplot plot

Output

pairplot - machine learning data visualization

Histogram

Output

histogram - machine learning data visualization

Conclusion

The objective of any machine learning process is solving complex problem and presenting research findings in a simple and clear way. There is no simple way to present results than using picture, it’s true that picture is worth a thousand words. Before and after we have developed our machine learning model we need to visualize and understand our data. There are different tools for data visualization but we have just looked at one of the mostly used tool which is Matplotlib with seaborn.

What’s Next

In this post we have learned about machine learning data visualization. In the next post we are going to look at data preprocessing.

Machine Learning Data Visualization

Post navigation