Deep Learning is a sub-field of machine learning that is inspired by the human nervous system. It simulates the working of human brain by representational learning with different levels of abstraction. Deep Learning is perceived as one of the emerging areas in machine learning field. Where traditional machine learning algorithms such as Support Vector Machine or Nearest Neighbor are unable to perform in tasks such as image recognition and natural language processing, deep learning algorithms have outperformed these traditional algorithms in such complex tasks. Deep Learning has been applied in many fields such as natural language processing, computer vision among others which has led to development of cutting edge technologies like autonomous car, medical diagnosis etc. In this post we are going to look at Deep Learning, some of its major architectures, its pros and cons and areas of application.

Introduction to Deep Learning

Introduction to Deep Learning - Deep Neural Network Truck2
Image from

As the name suggest Deep Learning focuses on extracting deeper representations of the data. Most Deep Learning algorithms use artificial neural network hence it’s also commonly called Deep Neural Network. It is an artificial neural network with many hidden layers between the input and output layers. An increase in the number of layers in a neural network increases the performance of the model. However, one of the challenges of deep learning is its requirements for large computing power. In this era of big data Deep Learning has shown to be the main tool that can harness value from such massive amount of data. As the amount of data increases also the performance continues to improve which makes Deep Learning a salable class of algorithms unlike other traditional algorithms. Deep Learning can be applied to both supervised, unsupervised and reinforcement learning.

The field of Deep Learning has become the hottest area of research with worlds renowned figures such as Ian Goodfellow, Yoshua Bengio, Yann LeCun and Andrew Ng, Peter Novig, Jurgen Schmidhuber and Geoffrey Hinton among many other respectable researchers focusing on different aspects of the Deep Learning architectures and publishing cutting edge research papers. With improved computation power, large data and large number of Deep Learning tools, Deep Learning algorithms are the number one choice of machine learning algorithm for solving complex problem. It solves problems that for many years have been dimmed as impossible achieving the state-of-the-art performance and even out-performing human-level accuracy.

How Deep Learning Works

Introduction to Deep Learning - Deep Neural Network
Image from

Deep Learning is a class of machine learning algorithms that have many hidden layers. The most common type of Deep Learning is Deep Neural Network. Deep Learning model is made up of an input layer, many hidden layers and an output layer. Let’s see what happens in each layer;

  1. Input Layer. The input layer receives the input data. The data can be in any form such as text, images or sound depending on the architecture. The input layer then forwards the input to the hidden layers.
  2. Hidden Layer. The number of hidden layers is what differentiates between a shallow neural network and a deep neural network. Here is where the actual computation takes place. Each hidden layer is made up of neurons. Between each neuron we have weights(sometimes referred to as importance). At the start the weights are assigned randomly and subsequent process involves adjusting the weights and biases. The neurons have activation function which are responsible for standardizing the output from the neuron.
  3. Output Layer. This is responsible for the prediction of the outcomes.

We train deep neural network model in a process called back propagation. This involves the feed forward approach where data are trained in a forward direction. After each forward pass we compute the difference between the predicted output and the real output. This is referred to as the cost function or sometimes loss function and it shows how wrong our model performs. The objective of calculating the cost function is for us to perform back propagation and adjust the weights so that we can have a cost function close to zero. To minimize the cost function we use a technique called Gradient Descent where we compute the derivative of the cost function by adjusting the weights in small increments. Deep Neural Network can perform both supervised and unsupervised learning tasks.

Deep Learning Architecture

Deep Learning models can be classified into different architectures. Below is the adopted classifications we will use in this post.

  1. Convolution Neural Network (CNN). This is deep neural network model that is widely used in object recognition. Pioneered by a famous AI research scientist Yann LeCun, CNN has revolutionized tasks in computer vision and image processing. The first version of the model was referred to as LeNet and it was used to recognize the hand written digits. Since then many sophisticated applications such as ImageNet and AlexNet have risen.
  2. Recurrent Neural Network (RNN). This is a deep neural network model that is used to model sequential input data. RNN is one of the most powerful machine learning algorithm that solves the sequential modeling problems efficiently. RNN has many variants such as Long Short Term Memory (LSTM), Gated Recurrent Unit (GRU) among others.
  3. Deep Belief Network (DBN). DBN is a deep generative model that is used in unsupervised learning. It is composed of simple auto-encoders networks such as Restricted Boltzmann Machines (RBMs)

There are many different deep neural networks architectures but such as Recursive Neural Networks, Deep stacking networks among others,however, in this post we have covered the three most common architectures.

Deep Learning Frameworks

Being an important area in technological revolution many tools for Deep Learning have been developed recently both commercial and open source. Below are just few commonly used Deep Learning frameworks and libraries.

  1. TensorFlow. This is a popular open source deep learning framework. It has been adopted by many companies such as Google, Twitter, IBM and many others. TensorFlow is supported as a desktop and a mobile framework. It is supported in Python, C++ and R.
  2. Microsoft Cognitive Toolkit/CNTK. It is an open source deep learning framework from Microsoft. It is supported in Python and C++ .
  3. Torch/PyTorch. It is one of the popular deep learning tool out there. It has been adopted by big companies such as Facebook, Twitter and Google among others. While Torch being a Lua based framework for deep learning PyTorch implements the Torch using the Python language.
  4. Chainer. This is flexible Python deep learning framework. It supports both CUDA and cuDNN through CuPy which offers high performance capabilities.
  5. Keras. This is a lightweight, flexible and minimalist Python deep neural network framework. It can run on either Tensorflow or Theano frameworks as back-ends.
  6. DeepLearning4J. This is scalable deep learning framework that is build in Java. It is also supported in Scala. It comes with many deep learning tools such as RNN,CNN, RNTN and more.
  7. Caffe. Caffe is a faster deep learning framework that is supported in both C, C++,Python and MATLAB. It is very efficient when working with Convolution Neural Networks (CNN).
  8. Theano. Developed by a Montreal Institute for Learning Algorithms (MILA) at the Université de Montréal, Theano is a powerful open source deep learning framework that is supported on both CPU and GPU architectures. It is highly optimized with the underlying components written in C.


  1.  Scales well with increase in data.
  2. Outperforms traditional models for complex tasks such as NLP, Computer Vision etc..
  3. Reduces feature engineering process


  1. High computational power.
  2. Large data set.
  3. Complex to work with

Applications of Deep Learning

Deep Learning has many use cases and the list is increasing each day. This is due to its potential of solving hard problems that we encounter daily where other machine learning models are unable to solve. Here are just few areas of applications of the Deep Learning.

  1.  Self-Driving cars.
  2. Medical diagnosis such as in cancer detection.
  3. Natural Language Processing.
  4. Robotics and Industrial application.
  5. Computer Vision.
  6. Risks and Fraud detection in finance


Since 2010 when it started gaining fame, Deep Learning has proved to be the next cutting edge domain of machine learning that offers solution to most complicated problems in our day to day lives. The idea of deep learning can be dated back in 1060’s when the Multi-Layer Perceptron was developed. The journey towards the development of Deep Learning has had ups and downs but with recent improvements in computing power and large amount of data Deep Learning methods are now outperforming other traditional methods on many tasks. However, there are still some pressing issues in deep learning such as understanding how and when to use the models. The usage of Deep Learning models largely depends on the user experience and careful evaluation of the models to ensure it gives state-of-the-art performance.

What’s Next

In this post we have looked at an overview of what Deep Learning is about, its applications, advantages and disadvantages. In this series of posts we will focus on Deep Learning based models and problems. These series of post assumes that you are familiar with machine learning. In the next post we will look at how to develop our first Deep Neural Network model using various tools such as TensorFlow and Keras.

Introduction to Deep Learning

Post navigation