Deep learning algorithms can work with almost any type of data and require massive amounts of computing power and data to solve complex problems. Let us now take a deep dive into one of the most well-known deep learning algorithms: the Recurrent Neural Network (RNN).
If you are looking for Convolutional Neural Network algorithm, read our article about it here:
Table of Contents
What is Deep Learning?
Deep learning is a subfield of machine learning that solves complex problems using artificial neural networks. These neural networks are made up of interconnected nodes arranged in multiple layers that extract features from input data. Large datasets are used to train these models, allowing them to detect patterns and correlations that humans would find difficult or impossible to detect.
Deep learning has had a significant impact on artificial intelligence. It has facilitated the development of intelligent systems capable of learning, adapting, and making decisions on their own. Deep learning has enabled remarkable progress in a variety of fields, including image and speech recognition, natural language processing, machine translation, autonomous driving, and many others.
7 reasons why Python is the perfect choice for Deep Learning
Python has grown in popularity as a programming language due to its versatility and ease of use in a wide range of computer science domains, particularly deep learning. Python has emerged as a top choice among many Machine Learning, Deep Learning, AI, and Data Science professionals due to its extensive range of libraries and frameworks specifically tailored for deep learning.
Here are seven reasons why Python is an excellent deep learning language:
1. Easy to learn
Python is a simple and easy-to-learn language, making it an excellent choice for beginners who want to learn deep learning.
2. Abundant libraries and frameworks
Python has a vast number of libraries and frameworks for deep learning, including TensorFlow, PyTorch, and Keras, which provide a lot of functionality and make it easy to build deep learning models.
3. Great community support
Python has a large and active community that provides excellent support, documentation, and resources for deep learning developers.
4. Versatile language
Python is a versatile language that can be used for a variety of tasks, including data science, automation, and web development.
5. Rapid prototyping
Python’s ease of use and simplicity make it easy to prototype deep learning models quickly.
6. Good visualization libraries
Python has excellent visualization libraries such as Matplotlib, which makes it easier to visualize data and results.
7. Cross-platform support
Python is a cross-platform language, which means that it can be used on multiple operating systems, including Windows, Mac, and Linux.
What is a Recurrent Neural Network (RNN)?
Quoting Reference [10]: “Recurrent neural networks have been an important focus of research and development during the 1990s. They are designed to learn sequential or time-varying patterns. A recurrent net is a neural network with feedback (closed loop) connections [5]. Examples include BAM, Hopfield, Boltzmann machine, and recurrent back propagation nets [8].
Recurrent neural network techniques have been applied to a wide variety of problems. Simple partially recurrent neural networks were introduced in the late 1980s by several researchers including Rumelhart, Hinton, and Williams [12] to learn strings of characters. Many other applications have addressed problems involving dynamical systems with time sequences of events.”
Below is how an unfolded RNN looks like:
The image below are the comparison between the scheme of 2 advanced RNN example (LSTM and GRU):
Illustration of (a) LSTM and (b) gated recurrent units. (a) i, f and o are the input, forget and output gates, respectively. c and denote the memory cell and the new memory cell content. (b) r and z are the reset and update gates, and h and are the activation and the candidate activation. Image source: Reference [4].
Are there any advanced RNN architectures?
Long Short-Term Memory (LSTM): What is it, and how does it work?
Long Short-Term Memory (LSTM) is a type of recurrent neural network (RNN) that is designed to handle the issue of the vanishing gradient problem faced by traditional RNNs. LSTMs were first proposed by Hochreiter and Schmidhuber in 1997 and have since been used in a wide range of applications, including speech recognition, machine translation, and image captioning [9].
LSTM networks have a unique architecture that allows them to store and access information over long periods. They are built with memory cells that can retain information over time, and gates that control the flow of information into and out of the memory cells. The gates are made up of sigmoid activation functions that determine how much information is passed on to the next time step. The input gate controls how much new information is added to the memory cell, while the output gate determines how much information is output from the memory cell to the next time step. The forget gate is responsible for deciding which information to discard from the memory cell. LSTM networks are designed to learn which information to remember and which to forget, making them well-suited for tasks that require the retention of long-term dependencies.
Gated-Recurrent units (GRU): What are they, and how do they work?
Gated Recurrent Units (GRUs) are a type of recurrent neural network (RNN) architecture that was proposed by Cho, et al. in 2014 [4]. Like LSTMs, GRUs are designed to address the issue of the vanishing gradient problem in traditional RNNs. However, GRUs have a simpler architecture than LSTMs, which makes them faster and easier to train.
GRUs have two gates, a reset gate and an update gate, that control the flow of information in the network. The update gate decides how much of the previous hidden state should be retained, and how much new information should be added to the current hidden state. The reset gate determines how much of the previous hidden state should be forgotten and how much new information should be added to the current hidden state. The reset and update gates work together to allow the network to selectively remember or forget information over time, enabling it to handle long-term dependencies. GRUs have been used in a variety of applications, including machine translation, speech recognition, and image captioning, and have shown competitive performance compared to LSTMs while requiring fewer computational resources.
How do RNN work to forecast stock prices?
The diagram above depicts a simplified representation of recurrent neural networks. If we use simple data to forecast stock prices [46,55,49,45,60,…], each input from X0 to Xt will contain a past value. For example, X0 has 46 and X1 has 55, and these values are used to predict the next number in a sequence [1].
How do I build and train a Recurrent Neural Network from scratch?
Let’s get hands-on with some Python code to build and train your own RNN from scratch.
We will train the LSTM and GRU models to forecast the stock price using Kaggle’s MasterCard stock dataset from May 25th, 2006 to October 11th, 2021 (to download the dataset, see Reference [11]). This is a simple project-based tutorial where we will analyze data, preprocess the data to train it on advanced RNN models, and finally evaluate the results (source: Reference [1]).
Prerequisites for building and training RNNs with Python
The following are some of the prerequisites for performing RNNs with Python in this project:
1. Use pandas for data manipulation
Read more about it here:
2. NumPy is used for data manipulation in Python
Read more about it here:
3. matplotlib.pyplot for data visualization with Python
Read more about how to use Matplotlib with Python here:
4. scikit-learn for scaling and evaluation
The following article explains how to use scikit-learn in a Delphi app with Python.
5. TensorFlow for modeling
To find out about TensorFlow, which is very popular for AI with Python read the following article. We will also set seeds for reproducibility.
Demo video:
6. Keras for modeling
Keras is one of the de facto standards for Python and AI.
Demo video:
7. Set seed for reproducibility
We will also set seeds for reproducibility.
Hands-on and selected outputs
The following is the code example for RNN:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 |
# Import libraries. ## Data manipulation. import numpy as np import pandas as pd ## Data visualization. import matplotlib.pyplot as plt ## Scaling & evaluation. from sklearn.preprocessing import MinMaxScaler from sklearn.metrics import mean_squared_error ## Modeling. from tensorflow.keras.models import Sequential from tensorflow.keras.layers import Dense, LSTM, Dropout, GRU, Bidirectional from tensorflow.keras.optimizers import SGD from tensorflow.random import set_seed ## Set seed. set_seed(455) np.random.seed(455) # EDA (exploratory data analysis). ## Load data, handle datetime, & drop unnecessary columns. dataset = pd.read_csv( "data/Mastercard_stock_history.csv", index_col="Date", parse_dates=["Date"] ).drop(["Dividends", "Stock Splits"], axis=1) print(dataset.head()) ## Descriptive statistics. print(dataset.describe()) ## Identify the missing values. print(dataset.isna().sum()) ## Split train-test set, and also plot it. tstart = 2016 tend = 2020 def train_test_plot(dataset, tstart, tend): dataset.loc[f"{tstart}":f"{tend}", "High"].plot(figsize=(16, 4), legend=True) dataset.loc[f"{tend+1}":, "High"].plot(figsize=(16, 4), legend=True) plt.legend([f"Train (Before {tend+1})", f"Test ({tend+1} and beyond)"]) plt.title("MasterCard stock price") plt.show() train_test_plot(dataset, tstart, tend) # Data preprocessing ## Really split the dataset into a train-test set this time. def train_test_split(dataset, tstart, tend): train = dataset.loc[f"{tstart}":f"{tend}", "High"].values test = dataset.loc[f"{tend+1}":, "High"].values return train, test training_set, test_set = train_test_split(dataset, tstart, tend) ## Standardize the data using MinMaxScaler, to avoid any outliers/anomalies. sc = MinMaxScaler(feature_range=(0, 1)) training_set = training_set.reshape(-1, 1) training_set_scaled = sc.fit_transform(training_set) ## Setup training steps (you can reduce or increase the number of steps to optimize model performance). def split_sequence(sequence, n_steps): X, y = list(), list() for i in range(len(sequence)): end_ix = i + n_steps if end_ix > len(sequence) - 1: Break seq_x, seq_y = sequence[i:end_ix], sequence[end_ix] X.append(seq_x) y.append(seq_y) return np.array(X), np.array(y) n_steps = 60 features = 1 ## Split into samples. X_train, y_train = split_sequence(training_set_scaled, n_steps) ## Reshaping X_train to fit on the LSTM model. X_train = X_train.reshape(X_train.shape[0],X_train.shape[1],features) # LSTM model. ## The LSTM architecture. model_lstm = Sequential() model_lstm.add(LSTM(units=125, activation="tanh", input_shape=(n_steps, features))) model_lstm.add(Dense(units=1)) ## Compiling the model. model_lstm.compile(optimizer="RMSprop", loss="mse") model_lstm.summary() ## Train model. model_lstm.fit(X_train, y_train, epochs=50, batch_size=32) # LSTM Results. ## Implement to the test dataset (repeat preprocessing, standardize, transform & split into samples, reshape, predict, and inverse transform the predictions into standard form). dataset_total = dataset.loc[:,"High"] inputs = dataset_total[len(dataset_total) - len(test_set) - n_steps :].values inputs = inputs.reshape(-1, 1) ### Scaling. inputs = sc.transform(inputs) ### Split into samples. X_test, y_test = split_sequence(inputs, n_steps) ### Reshape. X_test = X_test.reshape(X_test.shape[0], X_test.shape[1], features) ### Prediction. predicted_stock_price = model_lstm.predict(X_test) ### Inverse transform the values. predicted_stock_price = sc.inverse_transform(predicted_stock_price) ## Plot real vs predicted line chart (visualize the difference between actual & predicted values). def plot_predictions(test, predicted): plt.plot(test, color="gray", label="Real") plt.plot(predicted, color="red", label="Predicted") plt.title("MasterCard Stock Price Prediction") plt.xlabel("Time") plt.ylabel("MasterCard Stock Price") plt.legend() plt.show() def return_rmse(test, predicted): rmse = np.sqrt(mean_squared_error(test, predicted)) print("The root mean squared error is {:.2f}.".format(rmse)) plot_predictions(test_set,predicted_stock_price) ## Print out RMSE. return_rmse(test_set,predicted_stock_price) # GRU model. ## The GRU architecture. model_gru = Sequential() model_gru.add(GRU(units=125, activation="tanh", input_shape=(n_steps, features))) model_gru.add(Dense(units=1)) ## Compiling the model. model_gru.compile(optimizer="RMSprop", loss="mse") model_gru.summary() ## Train model. model_gru.fit(X_train, y_train, epochs=50, batch_size=32) # GRU Results. GRU_predicted_stock_price = model_gru.predict(X_test) GRU_predicted_stock_price = sc.inverse_transform(GRU_predicted_stock_price) plot_predictions(test_set, GRU_predicted_stock_price) ## Print out RMSE. return_rmse(test_set,GRU_predicted_stock_price) |
Let’s run the above code using PyScripter IDE. The following are a selection of the outputs:
1. Descriptive statistics
The .describe()
function allows us to thoroughly examine the data. Let’s concentrate on the High
column because we’ll be using it to train the model. We can also choose Close
or Open
columns for a model feature, but High
makes more sense because it tells us how high the share prices were on the given day.
2. Train-test split and visualize it
The train_test_plot
function plots a simple line graph with three arguments: dataset
, tstart
, and tend
. The tstart
and tend
are time limits expressed in years. We can modify these arguments to examine specific time periods. The line plot is split into two sections: train
and test
. This will allow us to decide how the test dataset will be distributed.
MasterCard’s stock price has been rising since 2016. It experienced a drop in the first quarter of 2020, but recovered to a stable position in the second half of the year. Our test dataset spans one year, from 2021 to 2022, with the remaining data used for training.
3. LSTM model summary
The model consists of a single hidden layer of LSTM and an output layer. You can play around with the number of units
, as more units will produce better results. For this experiment, we will set LSTM units to 125
, tanh
as activation
, and set input size.
We don’t have to create LSTM or GRU models from scratch because the TensorFlow library is user-friendly. To construct the model, we will simply use the LSTM
or GRU
modules.
Finally, we will compile the model with an RMSprop
optimizer with mean square error (mse
) as the loss
function.
4. Train the LSTM model
The model will be trained using 50 epochs
and 32 batch_sizes
. You can adjust the hyperparameters to shorten the training time or improve the results. The model training was completed successfully with the lowest possible loss
.
5. LSTM results
The plot_predictions
function will generate a line chart comparing Real
and Predicted
values. This will enable us to see the difference between the actual and predicted values.The return_rmse
function takes in test and predicted arguments and prints out the root mean square error (rmse
) metric.
The single-layered LSTM model performed well, as shown by the line plot above.
6. LSTM RMSE
The results appear promising, with the model achieving 6.70 rmse
on the test dataset.
7. GRU model summary
To properly compare the results, we’ll keep everything the same and simply replace the LSTM
layer with the GRU
layer. The model structure consists of a single GRU layer with 125 units
and an output layer.
8. Train the GRU model
The model was trained successfully with 50 epochs
and a batch_size
of 32
.
9. GRU results
As we can see, the Real
and Predicted
values are relatively close. The predicted line chart almost perfectly matches the actual values.
10. GRU RMSE
The GRU
model achieved 5.50 rmse
on the test dataset, outperforming the LSTM
model.
Congratulations, now you have learned how to build and train a Recurrent Neural Network (RNN) from scratch, and successfully run it inside PyScripter IDE with high speed & performance.
Visit our other AI-related articles here:
Click here to get started with PyScripter, a free, feature-rich, and lightweight Python IDE.
Download RAD Studio to create more powerful Python GUI Windows Apps in 5x less time.
Check out Python4Delphi, which makes it simple to create Python GUIs for Windows using Delphi.
Also, look into DelphiVCL, which makes it simple to create Windows GUIs with Python.
References & further readings
[1] Awan, A. A. (2022).
Recurrent Neural Network Tutorial (RNN). DataCamp Blog. datacamp.com/tutorial/tutorial-for-recurrent-neural-network
[2] Biswal, A. (2023).
Top 10 Deep Learning Algorithms You Should Know in 2023. Simplilearn. simplilearn.com/tutorials/deep-learning-tutorial/deep-learning-algorithm
[3] ChatGPT, personal communication, May 14, 2023.
[4] Chung, J., Gulcehre, C., Cho, K., & Bengio, Y. (2014).
Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv preprint arXiv:1412.3555.
[5] Fausett, L. (1994).
Fundamentals of neural networks. Prentice Hall, Englewood Cliffs, NJ, 7632.
[6] fdeloche. (2017).
A diagram for a one-unit Gated Recurrent Unit (GRU). Wikimedia. commons.wikimedia.org/wiki/ File:Gated_Recurrent_Unit.svg
[7] fdeloche. (2017).
A schema for LSTM neural network architecture. Wikimedia. commons.wikimedia.org/wiki/File:Long_Short-Term_Memory.svg
[8] Hecht-Nielsen, R. (1990).
Neurocomputing. Addison-Wesley, Reading, PA.
[9] Hochreiter, S., & Schmidhuber, J. (1997).
Long short-term memory. Neural computation, 9(8), 1735-1780.
[10] Medsker, L. R., & Jain, L. C. (2001).
Recurrent neural networks. Design and Applications, 5, 64-67.
[11] Rahman, K. (2023).
MasterCard Stock Data – Latest and Updated: MasterCard Stock Data – Downloaded using a Python Script and Yahoo! Finance API. Kaggle. kaggle.com/datasets/kalilurrahman/mastercard-stock-data-latest-and-updated
[12] Rumelhart, D. E., Hinton, G. E., and Williams, R. J. (1986).
Learning internal representations by error propagation, in Parallel Distributed Processing: Explorations in the Microstructure of Cognition. Rumelhart, D. E. and McClelland, J. L., Eds., MIT Press, Cambridge, 45.