Recurrent Neural Network RNN is a state-of-the art deep learning algorithm that is used to model sequential information. The output data of the Recurrent Neural Network is dependent of the input data. One key strength of Recurrent Neural Network is the ability to model the future based on the previous state. Since some flavors of RNN have memory, they can model long-term dependency. RNN has been successfully applied to complex tasks such Natural Language Processing (NLP) where it has delivered remarkable results. In this post we are going to look at a brief introduction to Recurrent Neural Networks, its flavors, and applications.

Introduction to Recurrent Neural Network

In the traditional neural network model, the input and output data are independent of each other. This approach has its own shortcoming when we want to predict the output while considering the previous state, which is a common case in NLP tasks. To overcome this limitation we use Recurrent Neural Networks. With RNN we can predict the output based on previous states. In RNN the input and output states are dependent of each other, as the output of the previous state is also used as the input of the current state. Being a class of artificial neural network RNN is trained with similar approach to the traditional neural network using Backpropagation, however, with a little twist using the Backpropagation Through Time (BPTT) due to its recurrent nature.

The figure below shows a simple Recurrent Neural Network.

Simple RNN - Introduction to Recurrent Neural Networks.png

In the above figure we have the following components;

x=>input vector
s=>activation function
o=>output vector
U=>Weight at hidden layer
W=>Weight at different Timestep
V=>Weight at output layer

Now let’s look at the following figure to understand how fully connected recurrent neural network work.

Introduction to Recurrent Neural Networks - Image.PNG

Let’s see what happens at each step.

xt=> This is the input vector at timestep t
st=> This is the hidden state at time step t computed as follow (st=f(Uxt+Wst-1)) where f is a non-linear activation function such as tanh or ReLU.
ot=>This is the output vector at time step t computed as ot=softmax(Vst)

Recurrent Neural Networks Architectures

  1. Fully Recurrent Neural Network (FRNN). This is a two layer (Input and Output layers) RNN where each input layer is connected to the output layer in a directed manner. Learning is accomplished through mapping the input sequences and activations to output sequences
  2. Simple Recurrent Network (SRN)/Elman and Jordan Networks. Elman Network is a three layered simple feedforward network. The hidden layer is connected to the context units with a fixed weight. Jordan Network is a simple recurrent networks (SRN) similar to Elman network but with exception of context units/state layer connected to the output layer instead of hidden layer. Elman and Jordan networks are group of simple recurrent networks (SRN).
  3. Hopfield Network. Hopfield network is a variant of RNN where all connections are symmetric. The activation values in the neurons is updated asynchronously and independently of other neurons.
  4. Long short-term memory (LSTM). This is a Recurrent deep neural network model with gating capability. The Long short-term memory (LSTM) is made up of a memory cell, an input gate, an output gate and a forget gate. The memory cell is responsible for remembering the previous state while the gates are responsible for controlling the amount of memory to be exposed. It is one of the class of RNN that prevents the problem of vanishing and or exploding gradient. This model will be covered later in the coming post.
  5. Gated Recurrent Network (GRU). Gated Recurrent Unit (GRU) is an Recurrent deep neural network model with gating capability similar to LSTM model but much simpler.

RNN Architecture types are many and here is a comprehensive list of other powerful RNN;
Recursive Neural Network
Echo State Network
Neural History Compressor
Bi-Directional Recurrent Neural Network
Continuous-Time Recurrent Neural Network (CTRNN)
Hierarchical Recurrent Neural Network
Recurrent Multilayer Perceptron Network
Multiple Timescales Model
Neural Turing Machines (NTM)
Differential Neural Computer (DNC)
Neural Network Pushdown Automata (NNPDA)

Applications of Recurrent Neural Networks

RNN has been widely used in natural language processing among other complex tasks. Below is a list of tasks that can be solved by RNN

  1. Machine translation
  2. Speech recognition
  3. Language modeling
  4. Text generation
  5. Image tagging
  6. Text data Analysis
  7. Time series prediction
  8. Robotic control

And many other applications

Conclusion

Recurrent Neural Network is revolutionizing the state of machine learning when it comes to sequence to sequence data processing. It has been widely applied to the most complex tasks of NLP and has outperformed other traditional algorithms. The fundamental principle of RNN is to model the future based on the previous data.  In this post we have listed various architectures of RNN.

What’s Next

In this post we have looked at the brief introduction to Recurrent Neural Network model. In the next post we will focus on one of the popular variant of RNN called the Long-Short-Term Memory (LSTM) and how it is implemented using keras.

Introduction to Recurrent Neural Network

Post navigation