Unlock the Power of Python for Deep Learning with Recurrent Neural Networks

Deep learning algorithms can work with almost any type of data and require massive amounts of computing power and data to solve complex problems. Let us now take a deep dive into one of the most well-known deep learning algorithms: the Recurrent Neural Network (RNN).

If you are looking for Convolutional Neural Network algorithm, read our article about it here:

Unlock the Power of Python for Deep Learning with Convolutional Neural Networks

Table of Contents

What is Deep Learning?

Deep learning is a subfield of machine learning that solves complex problems using artificial neural networks. These neural networks are made up of interconnected nodes arranged in multiple layers that extract features from input data. Large datasets are used to train these models, allowing them to detect patterns and correlations that humans would find difficult or impossible to detect.

Deep learning has had a significant impact on artificial intelligence. It has facilitated the development of intelligent systems capable of learning, adapting, and making decisions on their own. Deep learning has enabled remarkable progress in a variety of fields, including image and speech recognition, natural language processing, machine translation, autonomous driving, and many others.

Unlock the Power of Python for Deep Learning with Recurrent Neural Networks diagram of RNN — Example of how RNN used to build Googles autocompleting feature Image source Reference 2

7 reasons why Python is the perfect choice for Deep Learning

Python has grown in popularity as a programming language due to its versatility and ease of use in a wide range of computer science domains, particularly deep learning. Python has emerged as a top choice among many Machine Learning, Deep Learning, AI, and Data Science professionals due to its extensive range of libraries and frameworks specifically tailored for deep learning.

Here are seven reasons why Python is an excellent deep learning language:

1. Easy to learn

Python is a simple and easy-to-learn language, making it an excellent choice for beginners who want to learn deep learning.

2. Abundant libraries and frameworks

Python has a vast number of libraries and frameworks for deep learning, including TensorFlow, PyTorch, and Keras, which provide a lot of functionality and make it easy to build deep learning models.

3. Great community support

Python has a large and active community that provides excellent support, documentation, and resources for deep learning developers.

4. Versatile language

Python is a versatile language that can be used for a variety of tasks, including data science, automation, and web development.

5. Rapid prototyping

Python’s ease of use and simplicity make it easy to prototype deep learning models quickly.

6. Good visualization libraries

Python has excellent visualization libraries such as Matplotlib, which makes it easier to visualize data and results.

7. Cross-platform support

Python is a cross-platform language, which means that it can be used on multiple operating systems, including Windows, Mac, and Linux.

What is a Recurrent Neural Network (RNN)?

Quoting Reference [10]: “Recurrent neural networks have been an important focus of research and development during the 1990s. They are designed to learn sequential or time-varying patterns. A recurrent net is a neural network with feedback (closed loop) connections [5]. Examples include BAM, Hopfield, Boltzmann machine, and recurrent back propagation nets [8].

Recurrent neural network techniques have been applied to a wide variety of problems. Simple partially recurrent neural networks were introduced in the late 1980s by several researchers including Rumelhart, Hinton, and Williams [12] to learn strings of characters. Many other applications have addressed problems involving dynamical systems with time sequences of events.”

Below is how an unfolded RNN looks like:

Unlock the Power of Python for Deep Learning with Recurrent Neural Networks an example of an RNN — Structure of unfolded RNN Image source Reference 2

The image below are the comparison between the scheme of 2 advanced RNN example (LSTM and GRU):

Unlock the Power of Python for Deep Learning with Recurrent Neural Networks how an RNN works

Illustration of (a) LSTM and (b) gated recurrent units. (a) i, f and o are the input, forget and output gates, respectively. c and denote the memory cell and the new memory cell content. (b) r and z are the reset and update gates, and h and are the activation and the candidate activation. Image source: Reference [4].

Are there any advanced RNN architectures?

Long Short-Term Memory (LSTM): What is it, and how does it work?

Long Short-Term Memory (LSTM) is a type of recurrent neural network (RNN) that is designed to handle the issue of the vanishing gradient problem faced by traditional RNNs. LSTMs were first proposed by Hochreiter and Schmidhuber in 1997 and have since been used in a wide range of applications, including speech recognition, machine translation, and image captioning [9].

LSTM networks have a unique architecture that allows them to store and access information over long periods. They are built with memory cells that can retain information over time, and gates that control the flow of information into and out of the memory cells. The gates are made up of sigmoid activation functions that determine how much information is passed on to the next time step. The input gate controls how much new information is added to the memory cell, while the output gate determines how much information is output from the memory cell to the next time step. The forget gate is responsible for deciding which information to discard from the memory cell. LSTM networks are designed to learn which information to remember and which to forget, making them well-suited for tasks that require the retention of long-term dependencies.

Unlock the Power of Python for Deep Learning with Recurrent Neural Networks a diagram of a LSTM unit — A schema for LSTM neural network architecture Image source Reference 7

Gated-Recurrent units (GRU): What are they, and how do they work?

Gated Recurrent Units (GRUs) are a type of recurrent neural network (RNN) architecture that was proposed by Cho, et al. in 2014 [4]. Like LSTMs, GRUs are designed to address the issue of the vanishing gradient problem in traditional RNNs. However, GRUs have a simpler architecture than LSTMs, which makes them faster and easier to train.

GRUs have two gates, a reset gate and an update gate, that control the flow of information in the network. The update gate decides how much of the previous hidden state should be retained, and how much new information should be added to the current hidden state. The reset gate determines how much of the previous hidden state should be forgotten and how much new information should be added to the current hidden state. The reset and update gates work together to allow the network to selectively remember or forget information over time, enabling it to handle long-term dependencies. GRUs have been used in a variety of applications, including machine translation, speech recognition, and image captioning, and have shown competitive performance compared to LSTMs while requiring fewer computational resources.

Unlock the Power of Python for Deep Learning with Recurrent Neural Networks diagram of a GRU unit — A diagram for a one unit Gated Recurrent Unit GRU Image source Reference 6

How do RNN work to forecast stock prices?

02 rnnarchitecturetoforecaststockprices 8336768 — Image source Reference 1

The diagram above depicts a simplified representation of recurrent neural networks. If we use simple data to forecast stock prices [46,55,49,45,60,…], each input from X0 to Xt will contain a past value. For example, X0 has 46 and X1 has 55, and these values are used to predict the next number in a sequence [1].

How do I build and train a Recurrent Neural Network from scratch?

Let’s get hands-on with some Python code to build and train your own RNN from scratch.

We will train the LSTM and GRU models to forecast the stock price using Kaggle’s MasterCard stock dataset from May 25th, 2006 to October 11th, 2021 (to download the dataset, see Reference [11]). This is a simple project-based tutorial where we will analyze data, preprocess the data to train it on advanced RNN models, and finally evaluate the results (source: Reference [1]).

Prerequisites for building and training RNNs with Python

The following are some of the prerequisites for performing RNNs with Python in this project:

1. Use pandas for data manipulation

2. NumPy is used for data manipulation in Python

3. matplotlib.pyplot for data visualization with Python

Read more about how to use Matplotlib with Python here:

Learn To Work With Real-World Graphics Using The Python Matplotlib Library In A Delphi Windows App

4. scikit-learn for scaling and evaluation

The following article explains how to use scikit-learn in a Delphi app with Python.

Build A Machine Learning Solutions With Scikit-Learn Library In A Delphi Windows App

5. TensorFlow for modeling

To find out about TensorFlow, which is very popular for AI with Python read the following article. We will also set seeds for reproducibility.

Build An Artificial Intelligence Solution With TensorFlow Library In A Delphi Windows App

Demo video:

6. Keras for modeling

Keras is one of the de facto standards for Python and AI.

Artificial Intelligence Solutions With Keras Library In A Windows Python App

Demo video:

7. Set seed for reproducibility

We will also set seeds for reproducibility.

Hands-on and selected outputs

The following is the code example for RNN:

100

101

102

103

104

105

106

107

108

109

110

111

112

113

114

115

116

117

118

119

120

121

122

123

124

125

126

127

# Import libraries.

## Data manipulation.

import numpy as np

import pandas as pd

## Data visualization.

import matplotlib.pyplot as plt

## Scaling & evaluation.

from sklearn.preprocessing import MinMaxScaler

from sklearn.metrics import mean_squared_error

## Modeling.

from tensorflow.keras.models import Sequential

from tensorflow.keras.layers import Dense, LSTM, Dropout, GRU, Bidirectional

from tensorflow.keras.optimizers import SGD

from tensorflow.random import set_seed

## Set seed.

set_seed(455)

np.random.seed(455)

# EDA (exploratory data analysis).

## Load data, handle datetime, & drop unnecessary columns.

dataset = pd.read_csv(

"data/Mastercard_stock_history.csv", index_col="Date", parse_dates=["Date"]

).drop(["Dividends", "Stock Splits"], axis=1)

print(dataset.head())

## Descriptive statistics.

print(dataset.describe())

## Identify the missing values.

print(dataset.isna().sum())

## Split train-test set, and also plot it.

tstart = 2016

tend = 2020

def train_test_plot(dataset, tstart, tend):

dataset.loc[f"{tstart}":f"{tend}", "High"].plot(figsize=(16, 4), legend=True)

dataset.loc[f"{tend+1}":, "High"].plot(figsize=(16, 4), legend=True)

plt.legend([f"Train (Before {tend+1})", f"Test ({tend+1} and beyond)"])

plt.title("MasterCard stock price")

plt.show()

train_test_plot(dataset, tstart, tend)

# Data preprocessing

## Really split the dataset into a train-test set this time.

def train_test_split(dataset, tstart, tend):

train = dataset.loc[f"{tstart}":f"{tend}", "High"].values

test = dataset.loc[f"{tend+1}":, "High"].values

return train, test

training_set, test_set = train_test_split(dataset, tstart, tend)

## Standardize the data using MinMaxScaler, to avoid any outliers/anomalies.

sc = MinMaxScaler(feature_range=(0, 1))

training_set = training_set.reshape(-1, 1)

training_set_scaled = sc.fit_transform(training_set)

## Setup training steps (you can reduce or increase the number of steps to optimize model performance).

def split_sequence(sequence, n_steps):

X, y = list(), list()

for i in range(len(sequence)):

end_ix = i + n_steps

if end_ix > len(sequence) - 1:

Break

seq_x, seq_y = sequence[i:end_ix], sequence[end_ix]

X.append(seq_x)

y.append(seq_y)

return np.array(X), np.array(y)

n_steps = 60

features = 1

## Split into samples.

X_train, y_train = split_sequence(training_set_scaled, n_steps)

## Reshaping X_train to fit on the LSTM model.

X_train = X_train.reshape(X_train.shape[0],X_train.shape[1],features)

# LSTM model.

## The LSTM architecture.

model_lstm = Sequential()

model_lstm.add(LSTM(units=125, activation="tanh", input_shape=(n_steps, features)))

model_lstm.add(Dense(units=1))

## Compiling the model.

model_lstm.compile(optimizer="RMSprop", loss="mse")

model_lstm.summary()

## Train model.

model_lstm.fit(X_train, y_train, epochs=50, batch_size=32)

# LSTM Results.

## Implement to the test dataset (repeat preprocessing, standardize, transform & split into samples, reshape, predict, and inverse transform the predictions into standard form).

dataset_total = dataset.loc[:,"High"]

inputs = dataset_total[len(dataset_total) - len(test_set) - n_steps :].values

inputs = inputs.reshape(-1, 1)

### Scaling.

inputs = sc.transform(inputs)

### Split into samples.

X_test, y_test = split_sequence(inputs, n_steps)

### Reshape.

X_test = X_test.reshape(X_test.shape[0], X_test.shape[1], features)

### Prediction.

predicted_stock_price = model_lstm.predict(X_test)

### Inverse transform the values.

predicted_stock_price = sc.inverse_transform(predicted_stock_price)

## Plot real vs predicted line chart (visualize the difference between actual & predicted values).

def plot_predictions(test, predicted):

plt.plot(test, color="gray", label="Real")

plt.plot(predicted, color="red", label="Predicted")

plt.title("MasterCard Stock Price Prediction")

plt.xlabel("Time")

plt.ylabel("MasterCard Stock Price")

plt.legend()

plt.show()

def return_rmse(test, predicted):

rmse = np.sqrt(mean_squared_error(test, predicted))

print("The root mean squared error is {:.2f}.".format(rmse))

plot_predictions(test_set,predicted_stock_price)

## Print out RMSE.

return_rmse(test_set,predicted_stock_price)

# GRU model.

## The GRU architecture.

model_gru = Sequential()

model_gru.add(GRU(units=125, activation="tanh", input_shape=(n_steps, features)))

model_gru.add(Dense(units=1))

## Compiling the model.

model_gru.compile(optimizer="RMSprop", loss="mse")

model_gru.summary()

## Train model.

model_gru.fit(X_train, y_train, epochs=50, batch_size=32)

# GRU Results.

GRU_predicted_stock_price = model_gru.predict(X_test)

GRU_predicted_stock_price = sc.inverse_transform(GRU_predicted_stock_price)

plot_predictions(test_set, GRU_predicted_stock_price)

## Print out RMSE.

return_rmse(test_set,GRU_predicted_stock_price)

Let’s run the above code using PyScripter IDE. The following are a selection of the outputs:

1. Descriptive statistics

The .describe() function allows us to thoroughly examine the data. Let’s concentrate on the High column because we’ll be using it to train the model. We can also choose Close or Open columns for a model feature, but High makes more sense because it tells us how high the share prices were on the given day.

Unlock the Power of Python for Deep Learning with Recurrent Neural Networks Output 1

2. Train-test split and visualize it

The train_test_plot function plots a simple line graph with three arguments: dataset, tstart, and tend. The tstart and tend are time limits expressed in years. We can modify these arguments to examine specific time periods. The line plot is split into two sections: train and test. This will allow us to decide how the test dataset will be distributed.

MasterCard’s stock price has been rising since 2016. It experienced a drop in the first quarter of 2020, but recovered to a stable position in the second half of the year. Our test dataset spans one year, from 2021 to 2022, with the remaining data used for training.

Unlock the Power of Python for Deep Learning with Recurrent Neural Networks Output 2

3. LSTM model summary

The model consists of a single hidden layer of LSTM and an output layer. You can play around with the number of units, as more units will produce better results. For this experiment, we will set LSTM units to 125, tanh as activation, and set input size.

We don’t have to create LSTM or GRU models from scratch because the TensorFlow library is user-friendly. To construct the model, we will simply use the LSTM or GRU modules.

Finally, we will compile the model with an RMSprop optimizer with mean square error (mse) as the loss function.

Unlock the Power of Python for Deep Learning with Recurrent Neural Networks Summary 3

4. Train the LSTM model

The model will be trained using 50 epochs and 32 batch_sizes. You can adjust the hyperparameters to shorten the training time or improve the results. The model training was completed successfully with the lowest possible loss.

Unlock the Power of Python for Deep Learning with Recurrent Neural Networks Summary 4

5. LSTM results

The plot_predictions function will generate a line chart comparing Real and Predicted values. This will enable us to see the difference between the actual and predicted values.The return_rmse function takes in test and predicted arguments and prints out the root mean square error (rmse) metric.

Unlock the Power of Python for Deep Learning with Recurrent Neural Networks Summary 5

The single-layered LSTM model performed well, as shown by the line plot above.

6. LSTM RMSE

The results appear promising, with the model achieving 6.70 rmse on the test dataset.

Unlock the Power of Python for Deep Learning with Recurrent Neural Networks Summary 6

7. GRU model summary

To properly compare the results, we’ll keep everything the same and simply replace the LSTM layer with the GRU layer. The model structure consists of a single GRU layer with 125 units and an output layer.

Unlock the Power of Python for Deep Learning with Recurrent Neural Networks Summary 7

8. Train the GRU model

The model was trained successfully with 50 epochs and a batch_size of 32.

Unlock the Power of Python for Deep Learning with Recurrent Neural Networks Summary 8

9. GRU results

As we can see, the Real and Predicted values are relatively close. The predicted line chart almost perfectly matches the actual values.

Unlock the Power of Python for Deep Learning with Recurrent Neural Networks Example of the GRU results

10. GRU RMSE

The GRU model achieved 5.50 rmse on the test dataset, outperforming the LSTM model.

Unlock the Power of Python for Deep Learning with Recurrent Neural Networks The GRU RMSE

Congratulations, now you have learned how to build and train a Recurrent Neural Network (RNN) from scratch, and successfully run it inside PyScripter IDE with high speed & performance.

Visit our other AI-related articles here:

How To Make A State-Of-The-Art Deep Learning App With Fastai

Learn To Build A GUI For These 10 Ultimate Python AI Libraries

Click here to get started with PyScripter, a free, feature-rich, and lightweight Python IDE.

Download RAD Studio to create more powerful Python GUI Windows Apps in 5x less time.

Check out Python4Delphi, which makes it simple to create Python GUIs for Windows using Delphi.

Also, look into DelphiVCL, which makes it simple to create Windows GUIs with Python.

References & further readings

[1] Awan, A. A. (2022).

Recurrent Neural Network Tutorial (RNN). DataCamp Blog. datacamp.com/tutorial/tutorial-for-recurrent-neural-network

[2] Biswal, A. (2023).

Top 10 Deep Learning Algorithms You Should Know in 2023. Simplilearn. simplilearn.com/tutorials/deep-learning-tutorial/deep-learning-algorithm

[3] ChatGPT, personal communication, May 14, 2023.

[4] Chung, J., Gulcehre, C., Cho, K., & Bengio, Y. (2014).

Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv preprint arXiv:1412.3555.

[5] Fausett, L. (1994).

Fundamentals of neural networks. Prentice Hall, Englewood Cliffs, NJ, 7632.

[6] fdeloche. (2017).

A diagram for a one-unit Gated Recurrent Unit (GRU). Wikimedia. commons.wikimedia.org/wiki/ File:Gated_Recurrent_Unit.svg

[7] fdeloche. (2017).

A schema for LSTM neural network architecture. Wikimedia. commons.wikimedia.org/wiki/File:Long_Short-Term_Memory.svg

[8] Hecht-Nielsen, R. (1990).

Neurocomputing. Addison-Wesley, Reading, PA.

[9] Hochreiter, S., & Schmidhuber, J. (1997).

Long short-term memory. Neural computation, 9(8), 1735-1780.

[10] Medsker, L. R., & Jain, L. C. (2001).

Recurrent neural networks. Design and Applications, 5, 64-67.

[11] Rahman, K. (2023).

MasterCard Stock Data – Latest and Updated: MasterCard Stock Data – Downloaded using a Python Script and Yahoo! Finance API. Kaggle. kaggle.com/datasets/kalilurrahman/mastercard-stock-data-latest-and-updated

[12] Rumelhart, D. E., Hinton, G. E., and Williams, R. J. (1986).

Learning internal representations by error propagation, in Parallel Distributed Processing: Explorations in the Microstructure of Cognition. Rumelhart, D. E. and McClelland, J. L., Eds., MIT Press, Cambridge, 45.