Deep learning algorithms can work with almost any type of data and require massive amounts of computing power and data to solve complex problems. Let us now take a deep dive into one of the most well-known deep learning algorithms: the **Recurrent Neural Network (RNN)**.

If you are looking for Convolutional Neural Network algorithm, read our article about it here:

Table of Contents

**What is Deep Learning?**

Deep learning is a subfield of machine learning that solves complex problems using artificial neural networks. These neural networks are made up of interconnected nodes arranged in multiple layers that extract features from input data. Large datasets are used to train these models, allowing them to detect patterns and correlations that humans would find difficult or impossible to detect.

Deep learning has had a significant impact on artificial intelligence. It has facilitated the development of intelligent systems capable of learning, adapting, and making decisions on their own. Deep learning has enabled remarkable progress in a variety of fields, including image and speech recognition, natural language processing, machine translation, autonomous driving, and many others.

**7 reasons why Python is the perfect choice for Deep Learning**

Python has grown in popularity as a programming language due to its versatility and ease of use in a wide range of computer science domains, particularly deep learning. Python has emerged as a top choice among many Machine Learning, Deep Learning, AI, and Data Science professionals due to its extensive range of libraries and frameworks specifically tailored for deep learning.

Here are seven reasons why Python is an excellent deep learning language:

**1. Easy to learn**

Python is a simple and easy-to-learn language, making it an excellent choice for beginners who want to learn deep learning.

**2. Abundant libraries and frameworks**

Python has a vast number of libraries and frameworks for deep learning, including TensorFlow, PyTorch, and Keras, which provide a lot of functionality and make it easy to build deep learning models.

**3. Great community support**

Python has a large and active community that provides excellent support, documentation, and resources for deep learning developers.

**4. Versatile language**

Python is a versatile language that can be used for a variety of tasks, including data science, automation, and web development.

**5. Rapid prototyping**

Python’s ease of use and simplicity make it easy to prototype deep learning models quickly.

**6. Good visualization libraries**

Python has excellent visualization libraries such as Matplotlib, which makes it easier to visualize data and results.

**7. Cross-platform support**

Python is a cross-platform language, which means that it can be used on multiple operating systems, including Windows, Mac, and Linux.

**What is a Recurrent Neural Network (RNN)?**

Quoting Reference [10]: “Recurrent neural networks have been an important focus of research and development during the 1990s. They are designed to learn sequential or time-varying patterns. A recurrent net is a neural network with feedback (closed loop) connections [5]. Examples include BAM, Hopfield, Boltzmann machine, and recurrent back propagation nets [8].

Recurrent neural network techniques have been applied to a wide variety of problems. Simple partially recurrent neural networks were introduced in the late 1980s by several researchers including Rumelhart, Hinton, and Williams [12] to learn strings of characters. Many other applications have addressed problems involving dynamical systems with time sequences of events.”

Below is how an unfolded RNN looks like:

The image below are the comparison between the scheme of 2 advanced RNN example (LSTM and GRU):

Illustration of (a) LSTM and (b) gated recurrent units. (a) i, f and o are the input, forget and output gates, respectively. c and denote the memory cell and the new memory cell content. (b) r and z are the reset and update gates, and h and are the activation and the candidate activation. Image source: Reference [4].

**Are there any advanced RNN architectures?**

**Long Short-Term Memory (LSTM): What is it, and how does it work?**

Long Short-Term Memory (LSTM) is a type of recurrent neural network (RNN) that is designed to handle the issue of the vanishing gradient problem faced by traditional RNNs. LSTMs were first proposed by Hochreiter and Schmidhuber in 1997 and have since been used in a wide range of applications, including speech recognition, machine translation, and image captioning [9].

LSTM networks have a unique architecture that allows them to store and access information over long periods. They are built with memory cells that can retain information over time, and gates that control the flow of information into and out of the memory cells. The gates are made up of sigmoid activation functions that determine how much information is passed on to the next time step. The input gate controls how much new information is added to the memory cell, while the output gate determines how much information is output from the memory cell to the next time step. The forget gate is responsible for deciding which information to discard from the memory cell. LSTM networks are designed to learn which information to remember and which to forget, making them well-suited for tasks that require the retention of long-term dependencies.

**Gated-Recurrent units (GRU): What are they, and how do they work?**

Gated Recurrent Units (GRUs) are a type of recurrent neural network (RNN) architecture that was proposed by Cho, et al. in 2014 [4]. Like LSTMs, GRUs are designed to address the issue of the vanishing gradient problem in traditional RNNs. However, GRUs have a simpler architecture than LSTMs, which makes them faster and easier to train.

GRUs have two gates, a reset gate and an update gate, that control the flow of information in the network. The update gate decides how much of the previous hidden state should be retained, and how much new information should be added to the current hidden state. The reset gate determines how much of the previous hidden state should be forgotten and how much new information should be added to the current hidden state. The reset and update gates work together to allow the network to selectively remember or forget information over time, enabling it to handle long-term dependencies. GRUs have been used in a variety of applications, including machine translation, speech recognition, and image captioning, and have shown competitive performance compared to LSTMs while requiring fewer computational resources.

**How do RNN work to forecast stock prices?**

The diagram above depicts a simplified representation of recurrent neural networks. If we use simple data to forecast stock prices [46,55,49,45,60,…], each input from **X0** to **Xt** will contain a past value. For example, **X0** has 46 and **X1** has 55, and these values are used to predict the next number in a sequence [1].

**How do I build and train a Recurrent Neural Network from scratch?**

Let’s get hands-on with some Python code to build and train your own RNN from scratch.

We will train the LSTM and GRU models to forecast the stock price using Kaggle’s MasterCard stock dataset from May 25th, 2006 to October 11th, 2021 (to download the dataset, see Reference [11]). This is a simple project-based tutorial where we will analyze data, preprocess the data to train it on advanced RNN models, and finally evaluate the results (source: Reference [1]).

**Prerequisites for building and training RNNs with Python**

The following are some of the prerequisites for performing RNNs with Python in this project:

**1. Use pandas for data manipulation**

Read more about it here:

**2. NumPy is used for data manipulation in Python**

Read more about it here:

**3. matplotlib.pyplot for data visualization with Python**

Read more about how to use Matplotlib with Python here:

**4. scikit-learn for scaling and evaluation**

The following article explains how to use scikit-learn in a Delphi app with Python.

**5. TensorFlow for modeling**

To find out about TensorFlow, which is very popular for AI with Python read the following article. We will also set seeds for reproducibility.

Demo video:

**6. Keras for modeling**

Keras is one of the de facto standards for Python and AI.

Demo video:

**7. Set seed for reproducibility**

We will also set seeds for reproducibility.

**Hands-on and selected outputs**

The following is the code example for RNN:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 |
# Import libraries. ## Data manipulation. import numpy as np import pandas as pd ## Data visualization. import matplotlib.pyplot as plt ## Scaling & evaluation. from sklearn.preprocessing import MinMaxScaler from sklearn.metrics import mean_squared_error ## Modeling. from tensorflow.keras.models import Sequential from tensorflow.keras.layers import Dense, LSTM, Dropout, GRU, Bidirectional from tensorflow.keras.optimizers import SGD from tensorflow.random import set_seed ## Set seed. set_seed(455) np.random.seed(455) # EDA (exploratory data analysis). ## Load data, handle datetime, & drop unnecessary columns. dataset = pd.read_csv( "data/Mastercard_stock_history.csv", index_col="Date", parse_dates=["Date"] ).drop(["Dividends", "Stock Splits"], axis=1) print(dataset.head()) ## Descriptive statistics. print(dataset.describe()) ## Identify the missing values. print(dataset.isna().sum()) ## Split train-test set, and also plot it. tstart = 2016 tend = 2020 def train_test_plot(dataset, tstart, tend): dataset.loc[f"{tstart}":f"{tend}", "High"].plot(figsize=(16, 4), legend=True) dataset.loc[f"{tend+1}":, "High"].plot(figsize=(16, 4), legend=True) plt.legend([f"Train (Before {tend+1})", f"Test ({tend+1} and beyond)"]) plt.title("MasterCard stock price") plt.show() train_test_plot(dataset, tstart, tend) # Data preprocessing ## Really split the dataset into a train-test set this time. def train_test_split(dataset, tstart, tend): train = dataset.loc[f"{tstart}":f"{tend}", "High"].values test = dataset.loc[f"{tend+1}":, "High"].values return train, test training_set, test_set = train_test_split(dataset, tstart, tend) ## Standardize the data using MinMaxScaler, to avoid any outliers/anomalies. sc = MinMaxScaler(feature_range=(0, 1)) training_set = training_set.reshape(-1, 1) training_set_scaled = sc.fit_transform(training_set) ## Setup training steps (you can reduce or increase the number of steps to optimize model performance). def split_sequence(sequence, n_steps): X, y = list(), list() for i in range(len(sequence)): end_ix = i + n_steps if end_ix > len(sequence) - 1: Break seq_x, seq_y = sequence[i:end_ix], sequence[end_ix] X.append(seq_x) y.append(seq_y) return np.array(X), np.array(y) n_steps = 60 features = 1 ## Split into samples. X_train, y_train = split_sequence(training_set_scaled, n_steps) ## Reshaping X_train to fit on the LSTM model. X_train = X_train.reshape(X_train.shape[0],X_train.shape[1],features) # LSTM model. ## The LSTM architecture. model_lstm = Sequential() model_lstm.add(LSTM(units=125, activation="tanh", input_shape=(n_steps, features))) model_lstm.add(Dense(units=1)) ## Compiling the model. model_lstm.compile(optimizer="RMSprop", loss="mse") model_lstm.summary() ## Train model. model_lstm.fit(X_train, y_train, epochs=50, batch_size=32) # LSTM Results. ## Implement to the test dataset (repeat preprocessing, standardize, transform & split into samples, reshape, predict, and inverse transform the predictions into standard form). dataset_total = dataset.loc[:,"High"] inputs = dataset_total[len(dataset_total) - len(test_set) - n_steps :].values inputs = inputs.reshape(-1, 1) ### Scaling. inputs = sc.transform(inputs) ### Split into samples. X_test, y_test = split_sequence(inputs, n_steps) ### Reshape. X_test = X_test.reshape(X_test.shape[0], X_test.shape[1], features) ### Prediction. predicted_stock_price = model_lstm.predict(X_test) ### Inverse transform the values. predicted_stock_price = sc.inverse_transform(predicted_stock_price) ## Plot real vs predicted line chart (visualize the difference between actual & predicted values). def plot_predictions(test, predicted): plt.plot(test, color="gray", label="Real") plt.plot(predicted, color="red", label="Predicted") plt.title("MasterCard Stock Price Prediction") plt.xlabel("Time") plt.ylabel("MasterCard Stock Price") plt.legend() plt.show() def return_rmse(test, predicted): rmse = np.sqrt(mean_squared_error(test, predicted)) print("The root mean squared error is {:.2f}.".format(rmse)) plot_predictions(test_set,predicted_stock_price) ## Print out RMSE. return_rmse(test_set,predicted_stock_price) # GRU model. ## The GRU architecture. model_gru = Sequential() model_gru.add(GRU(units=125, activation="tanh", input_shape=(n_steps, features))) model_gru.add(Dense(units=1)) ## Compiling the model. model_gru.compile(optimizer="RMSprop", loss="mse") model_gru.summary() ## Train model. model_gru.fit(X_train, y_train, epochs=50, batch_size=32) # GRU Results. GRU_predicted_stock_price = model_gru.predict(X_test) GRU_predicted_stock_price = sc.inverse_transform(GRU_predicted_stock_price) plot_predictions(test_set, GRU_predicted_stock_price) ## Print out RMSE. return_rmse(test_set,GRU_predicted_stock_price) |

Let’s run the above code using PyScripter IDE. The following are a selection of the outputs:

**1. Descriptive statistics**

The `.describe()`

function allows us to thoroughly examine the data. Let’s concentrate on the `High`

column because we’ll be using it to train the model. We can also choose `Close`

or `Open`

columns for a model feature, but `High`

makes more sense because it tells us how high the share prices were on the given day.

**2. Train-test split and visualize it**

The `train_test_plot`

function plots a simple line graph with three arguments: `dataset`

, `tstart`

, and `tend`

. The `tstart`

and `tend`

are time limits expressed in years. We can modify these arguments to examine specific time periods. The line plot is split into two sections: `train`

and `test`

. This will allow us to decide how the test dataset will be distributed.

MasterCard’s stock price has been rising since 2016. It experienced a drop in the first quarter of 2020, but recovered to a stable position in the second half of the year. Our test dataset spans one year, from 2021 to 2022, with the remaining data used for training.

**3. LSTM model summary**

The model consists of a single hidden layer of LSTM and an output layer. You can play around with the number of `units`

, as more units will produce better results. For this experiment, we will set LSTM units to `125`

, `tanh`

as `activation`

, and set input size.

We don’t have to create LSTM or GRU models from scratch because the TensorFlow library is user-friendly. To construct the model, we will simply use the `LSTM`

or `GRU`

modules.

Finally, we will compile the model with an `RMSprop`

optimizer with mean square error (`mse`

) as the `loss`

function.

**4. Train the LSTM model**

The model will be trained using `50 epochs`

and `32 batch_sizes`

. You can adjust the hyperparameters to shorten the training time or improve the results. The model training was completed successfully with the lowest possible `loss`

.

**5. LSTM results**

The `plot_predictions`

function will generate a line chart comparing `Real`

and `Predicted`

values. This will enable us to see the difference between the actual and predicted values.The `return_rmse`

function takes in test and predicted arguments and prints out the root mean square error (`rmse`

) metric.

The single-layered LSTM model performed well, as shown by the line plot above.

**6. LSTM RMSE**

The results appear promising, with the model achieving `6.70 rmse`

on the test dataset.

**7. GRU model summary**

To properly compare the results, we’ll keep everything the same and simply replace the `LSTM`

layer with the `GRU`

layer. The model structure consists of a single GRU layer with `125 units`

and an output layer.

**8. Train the GRU model**

The model was trained successfully with `50 epochs`

and a `batch_size`

of `32`

.

**9. GRU results**

As we can see, the `Real`

and `Predicted`

values are relatively close. The predicted line chart almost perfectly matches the actual values.

**10. GRU RMSE**

The `GRU`

model achieved `5.50 rmse`

on the test dataset, outperforming the `LSTM`

model.

Congratulations, now you have learned how to build and train a **Recurrent Neural Network (RNN) **from scratch, and successfully run it inside **PyScripter IDE **with high speed & performance.

Visit our other **AI-related articles** here:

**Click here to get started with ****PyScripter****, a free, feature-rich, and lightweight Python IDE.**

**Download ****RAD Studio**** to create more powerful Python GUI Windows Apps in 5x less time.**

**Check out ****Python4Delphi****, which makes it simple to create Python GUIs for Windows using Delphi.**

**Also, look into ****DelphiVCL****, which makes it simple to create Windows GUIs with Python.**

**References & further readings**

[1] Awan, A. A. (2022). *Recurrent Neural Network Tutorial (RNN).* DataCamp Blog. datacamp.com/tutorial/tutorial-for-recurrent-neural-network

[2] Biswal, A. (2023). *Top 10 Deep Learning Algorithms You Should Know in 2023.* Simplilearn. simplilearn.com/tutorials/deep-learning-tutorial/deep-learning-algorithm

[3] ChatGPT, personal communication, May 14, 2023.

[4] Chung, J., Gulcehre, C., Cho, K., & Bengio, Y. (2014). *Empirical evaluation of gated recurrent neural networks on sequence modeling. *arXiv preprint arXiv:1412.3555.

[5] Fausett, L. (1994). *Fundamentals of neural networks. *Prentice Hall, Englewood Cliffs, NJ, 7632.

[6] fdeloche. (2017). *A diagram for a one-unit Gated Recurrent Unit (GRU).* Wikimedia. commons.wikimedia.org/wiki/File:Gated_Recurrent_Unit.svg

[7] fdeloche. (2017). *A schema for LSTM neural network architecture.* Wikimedia. commons.wikimedia.org/wiki/File:Long_Short-Term_Memory.svg

[8] Hecht-Nielsen, R. (1990). *Neurocomputing.* Addison-Wesley, Reading, PA.

[9] Hochreiter, S., & Schmidhuber, J. (1997). *Long short-term memory.* Neural computation, 9(8), 1735-1780.

[10] Medsker, L. R., & Jain, L. C. (2001). *Recurrent neural networks.* Design and Applications, 5, 64-67.

[11] Rahman, K. (2023). *MasterCard Stock Data – Latest and Updated: MasterCard Stock Data – Downloaded using a Python Script and Yahoo! Finance API.* Kaggle. kaggle.com/datasets/kalilurrahman/mastercard-stock-data-latest-and-updated

[12] Rumelhart, D. E., Hinton, G. E., and Williams, R. J. (1986). *Learning internal representations by error propagation, in Parallel Distributed Processing: Explorations in the Microstructure of Cognition.* Rumelhart, D. E. and McClelland, J. L., Eds., MIT Press, Cambridge, 45.