CodeIDEProjectsPythonPython GUIWindows

Unlock the Power of Python for Deep Learning with Long-Short-Term Memory Networks

Unlock the Power of Python for Deep Learning with Long Short Term Memory Networks

In order to solve complicated issues, deep learning algorithms need enormous volumes of data and computational power. They can operate with nearly any form of data. The Long-Short-Term Memory Networks (LSTM), one of the most well-known deep learning techniques, will now be examined in-depth, in this article.

What is Deep Learning?

Deep learning, a branch of machine learning, addresses intricate problems through the utilization of artificial neural networks. These networks consist of interconnected nodes organized in multiple layers, extracting features from input data. Extensive datasets are employed to train these models, enabling them to identify patterns and correlations that might be challenging or impossible for humans to perceive.

The impact of deep learning on artificial intelligence has been substantial. It has paved the way for the development of intelligent systems capable of independent learning, adaptation, and decision-making. Deep learning has led to remarkable advancements in various domains, encompassing image and speech recognition, natural language processing, machine translation, autonomous driving, and numerous others.

Unlock the Power of Python for Deep Learning with Long Short Term Memory Networks diagram
An example of LSTMs advanced architecture GNMT Googles Neural Machine Translation system Read more on Reference 10

Why Use Python for Deep Learning, Machine Learning, and Artificial Intelligence?

Python has gained widespread popularity as a programming language due to its versatility and ease of use in diverse domains of computer science, especially in the field of deep learning, machine learning, and AI. 

We’ve reviewed several times about why Python is great for Deep Learning, Machine Learning, and Artificial Intelligence (also all the prerequisites), in the following articles:

What is a Long-Short-Term Memory Network (LSTM)?

Long Short-Term Memory (LSTM) networks are a modified version of recurrent neural networks, which makes it easier to remember past data in memory.  LSTM is introduced to solve the performance degradation of RNNs in long-term sequences (vanishing gradient problem). Read more about it from Hochreiter, S., & Schmidhuber, J. paper (reference [4]).

LSTM is well-suited to classify, process, and predict time series given time lags of unknown duration. It trains the model by using back-propagation. In an LSTM network, three gates are present: The input gate, forget gate, and the output gate.

Unlock the Power of Python for Deep Learning with Long Short Term Memory Networks LTSM architecture diagram
LSTM architecture Image source Reference 2

I’ve talked a little bit about LSTM as an advanced RNN architectures, as well as GRU, in the previous deep learning article:

How does LSTM work for Machine Translation?

Sequence to Sequence modeling is one of the many intriguing uses of natural language processing. Both language translation systems and question-answering systems make extensive use of it.

The goal of sequence-to-sequence (Seq2Seq) modeling is to develop models that can convert sequences from one domain to another, such as translating English to German. The LSTM encoder and decoder execute this Seq2Seq modeling [7].

Unlock the Power of Python for Deep Learning with Long Short Term Memory Networks diagram 2

Here’s how it works:

  • Feed the embedding vectors for source sequences (German), to the encoder network, one word at a time.
  • Encode the input sentences into fixed-dimension state vectors. At this step, we get the hidden and cell states from the encoder LSTM and feed it to the decoder LSTM.
  • These states are regarded as initial states by the decoder. Additionally, it also has embedding vectors for target words (English).
  • Decode and output the translated sentence, one word at a time. In this step, the output of the decoder is sent to a softmax layer over the entire target vocabulary.

A typical seq2seq model has 2 major components: An encoder, and a decoder. Both these parts are essentially two different recurrent neural network (RNN) models combined into one giant network:

Unlock the Power of Python for Deep Learning with Long Short Term Memory Networks diagram 3

How do I build and train an LSTM for Machine Translation from scratch?

Let’s get hands-on with some Python code to build and train your own LSTMs from scratch.

In this article, we will create a language translation model using seq2seq architecture and LSTM network, as it is a very famous application of neural machine translation (including Google Translate). Brace yourself, this article is a little bit more intense, compared to all my previous tutorials.

We will work with the Kaggle Bilingual Sentence Pairs dataset (reference [3]) to train the LSTM, so it can predict the unseen data, or even more, perform machine translation. The original source of the dataset is the Tatoeba Project (to download the dataset, see Reference [1 & 3]).

The actual data contains over 150,000 sentence pairs. However, we will use only the first 50,000 sentence pairs in the 1st demo, and the first 20,000 sentence pairs in the 2nd demo to reduce the training time of the model,  (of course, this will lead to not-really-satisfying results, but this article will still serve its purpose as proof-of-concept). You can increase this number if you are equipped with a powerful computer.

Hands-on and selected outputs (1st example)

This example is modified from reference [5]. The original reference already shows excellent results when predicting the unseen data in English. However, I modified the code slightly, so we can test it to predict the unseen data in German.

The following is the code example of implementing LSTM for machine translation:

To execute the code provided, we can utilize the PyScripter IDE. Here are a few selected outputs:

1. Visualizing the distribution of sentence lengths (eng vs deu)

We will generate a plot to illustrate the distribution of sentence lengths. For this purpose, we will store the lengths of all English sentences in one list and the lengths of all German sentences in another.

Unlock the Power of Python for Deep Learning with Long Short Term Memory Networks visualizing

2. Maximum sentence length and vocabulary size (eng vs deu)

It is quite intuitive that the maximum length of German sentences is 15, whereas for English phrases, it is 7.

Unlock the Power of Python for Deep Learning with Long Short Term Memory Networks the maximum sentence length

To facilitate the utilization of a Seq2Seq model, it is necessary to convert both input and output sentences into fixed-length integer sequences. To achieve this, we employ the Tokenizer() class from Keras, which transforms our sentences into sequences of integers. These sequences are then padded with zeros to ensure uniform length across all sequences.

In order to prepare for this process, we create tokenizers for both German and English sentences. At the same time, we also counted the vocabulary size for both languages and printed them out as can be seen above.

3. Model training and saving the best result

For the training process, we will run the model for 30 epochs, utilizing a batch_size of 512 and a validation_split of 20%. This means that 80% of the data will be allocated for training the model, while the remaining 20% will be used for evaluation. Feel free to experiment and modify these hyperparameters to suit your needs.

To ensure that we capture the model’s best performance, we will employ the ModelCheckpoint() function, which saves the model with the lowest validation loss (val_loss). The resulting model with the best performance will be automatically stored in the “model.h1.28_may_23” folder.

Unlock the Power of Python for Deep Learning with Long Short Term Memory Networks code 1
Here is the validation loss score for each epoch:

Epoch-01:

Unlock the Power of Python for Deep Learning with Long Short Term Memory Networks epoch 1

Epoch-02:

Unlock the Power of Python for Deep Learning with Long Short Term Memory Networks epoch 2

Epoch-03:

Unlock the Power of Python for Deep Learning with Long Short Term Memory Networks epoch 3

Epoch-04:

Unlock the Power of Python for Deep Learning with Long Short Term Memory Networks epoch 4

Epoch-05:

Unlock the Power of Python for Deep Learning with Long Short Term Memory Networks epoch 5

Epoch-06:

Unlock the Power of Python for Deep Learning with Long Short Term Memory Networks epoch 6

Epoch-07:

Unlock the Power of Python for Deep Learning with Long Short Term Memory Networks epoch 7

Epoch-08:

Unlock the Power of Python for Deep Learning with Long Short Term Memory Networks epoch 8

Epoch-09:

Unlock the Power of Python for Deep Learning with Long Short Term Memory Networks epoch 9

Epoch-10:

Unlock the Power of Python for Deep Learning with Long Short Term Memory Networks epoch 10

Epoch-11:

Unlock the Power of Python for Deep Learning with Long Short Term Memory Networks epoch 11

Epoch-12:

Unlock the Power of Python for Deep Learning with Long Short Term Memory Networks epoch 12

Epoch-13:

Unlock the Power of Python for Deep Learning with Long Short Term Memory Networks epoch 13

Epoch-14:

Unlock the Power of Python for Deep Learning with Long Short Term Memory Networks epoch 14

Epoch-15:

Unlock the Power of Python for Deep Learning with Long Short Term Memory Networks epoch 15

Epoch-16:

Unlock the Power of Python for Deep Learning with Long Short Term Memory Networks epoch 16

Epoch-17:

Unlock the Power of Python for Deep Learning with Long Short Term Memory Networks epoch 17

Epoch-18:

Unlock the Power of Python for Deep Learning with Long Short Term Memory Networks epoch 18

Epoch-19:

Unlock the Power of Python for Deep Learning with Long Short Term Memory Networks epoch 19

Epoch-20:

Unlock the Power of Python for Deep Learning with Long Short Term Memory Networks epoch 20

Epoch-21:

Unlock the Power of Python for Deep Learning with Long Short Term Memory Networks epoch 21

Epoch-22:

Unlock the Power of Python for Deep Learning with Long Short Term Memory Networks epoch 22

Epoch-23:

Unlock the Power of Python for Deep Learning with Long Short Term Memory Networks epoch 23

Epoch-24:

Unlock the Power of Python for Deep Learning with Long Short Term Memory Networks epoch 24

Epoch-25:

Unlock the Power of Python for Deep Learning with Long Short Term Memory Networks epoch 25

Epoch-26:

Unlock the Power of Python for Deep Learning with Long Short Term Memory Networks epoch 26

Epoch-27:

Unlock the Power of Python for Deep Learning with Long Short Term Memory Networks epoch 27

Epoch-28:

Unlock the Power of Python for Deep Learning with Long Short Term Memory Networks epoch 28

Epoch-29:

Unlock the Power of Python for Deep Learning with Long Short Term Memory Networks epoch 29

Epoch-30:

Unlock the Power of Python for Deep Learning with Long Short Term Memory Networks epoch 30

The model training processes are executed flawlessly within the PyScripter IDE, ensuring a smooth and error-free experience without any delays or disruptions.

4. Plot train loss vs validation loss

Let’s plot and compare the training loss and the validation loss.

Unlock the Power of Python for Deep Learning with Long Short Term Memory Networks plotting the loss

As you can see in the plot above, the validation loss plateaus after the 20th epoch, indicating that the model has likely converged and will not improve further with additional training.

5. Generating predictions for unseen data

The generated predictions consist of sequences represented by integers. To make these predictions more understandable, we must convert these integers back into their respective words.

Once the conversion is complete and the original sentences are placed in the test dataset while the predicted sentences are stored in a data frame, we can randomly display some instances of actual sentences compared to their corresponding predicted sentences. This allows us to assess the performance of our model:

Unlock the Power of Python for Deep Learning with Long Short Term Memory Networks a code example

Prediction results (eng):

Unlock the Power of Python for Deep Learning with Long Short Term Memory Networks prediction results

For the English results, the model is doing a pretty decent job. 

Let’s do similar things for Deutsch, and here are the prediction results (deu):

Unlock the Power of Python for Deep Learning with Long Short Term Memory Networks Deutsch

Unfortunately, the predictions for Deutsch are still unsatisfactory, and in some cases, completely incorrect. For instance, the model failed to identify that “Boston” refers to a city and “Tom” is a person’s name. To identify these errors, you can cross-reference them using Google Translator.

Eventually, after enough training epochs, using more training data and building a better (or more complex) model (if you have enough computational power to do so), the results will gradually improve over time. These are the challenges we will face regularly in NLP. But these aren’t immovable obstacles.

This is how you would use LSTM to solve a sequence prediction task. Let’s try another scenario of implementation, in the next subsection.

Hands-on and selected outputs (2nd example)

This second implementation scenario refers to Reference [7]. But, that excellent blog post is implementing the neural machine translation from English to French. In this article, we will try English to Deutsch, instead. 

In this 2nd example of LSTM implementation for machine translation, we still use the same dataset, but, for the word embedding, we will utilize the GloVe (Global Vectors for Word Representation) word embeddings (see reference [8]).

The following is the second example of implementing LSTM for machine translation:

Let’s run the above code using PyScripter IDE. And the following are some selected outputs:

1. Word embeddings

This part shows the implementation of word embeddings for neural machine translation.Here is the printed result of the word embeddings for the word “go” using the GloVe word embedding dictionary.

Unlock the Power of Python for Deep Learning with Long Short Term Memory Networks word embeddings

check the 20th index of the word embedding matrix (the word “go”), and its shows a consistent result:

Unlock the Power of Python for Deep Learning with Long Short Term Memory Networks epoch an embedded matrix

2. Plot our LSTM model for machine translation

Another interesting part of this second approach is, after we compile the model, we can plot it using tf.keras.utils.plot_model. So, we can keep tracking and communicate about all the inputs, outputs, steps, layers, etc clearly.

Unlock the Power of Python for Deep Learning with Long Short Term Memory Networks keras model

3. Plot the modified model for prediction

Before making any predictions, first, we need to modify our model.

The following is the plot of our model after some modification performed to make predictions:

Unlock the Power of Python for Deep Learning with Long Short Term Memory Networks plot modified model for prediction

4. Train the model

I trained the model in 30 epochs, you can modify the number of epochs to see if you can get better results. The model is trained on 18,000 sentences and tested on the remaining 2,000 sentences. You can also add the number of records if you want to get better results, and if you have capable computational resources.

Unlock the Power of Python for Deep Learning with Long Short Term Memory Networks training the model
Unlock the Power of Python for Deep Learning with Long Short Term Memory Networks training

5. Test the model

To test the code, we will randomly choose a sentence from the input_sentences list, retrieve the corresponding padded sequence for the sentence, and will pass it to the translate_sentence() method. The method will return the translated sentence as shown below.

1st attempt of the test:

Unlock the Power of Python for Deep Learning with Long Short Term Memory Networks testing

2nd attempt:

Unlock the Power of Python for Deep Learning with Long Short Term Memory Networks testing 2

3rd attempt:

Unlock the Power of Python for Deep Learning with Long Short Term Memory Networks testing 3

Again, it seems that the results in Deutsch are still far from satisfying. To identify these errors, you can cross-reference them using Google Translator. 

Eventually, after enough training epochs, using more training data and building a better (or more complex) model (if you have enough computational power to do so), will give better and better results over time. These are the challenges we will face regularly in NLP.

Endnotes

LSTMs are a very promising solution to sequence-related problems. It has tons of very useful implementations out there, from time series prediction, weather forecasting, machine translation, speech recognition, and many more. According to Google Scholar, no other computer science paper of the 20th century is receiving as many citations per year as the original 1997 journal publication on Long Short-Term Memory (LSTM) artificial neural networks (NNs) [9]. 

However, the one disadvantage that I find about them is the difficulty in training them. A lot of data, training epochs/time, and system resources are needed to go into training even a simple model. But that is just a hardware constraint, and PyScripter IDE handles it very well, lightweight, with zero error or lag! 

I hope this article was successful in giving you a basic understanding and workflow of how these networks work.

Check out the full repository here:

github.com/Embarcadero/DL_Python03_LSTM


Click here to get started with PyScripter, a free, feature-rich, and lightweight Python IDE.

Download RAD Studio to create more powerful Python GUI Windows Apps in 5x less time.

Check out Python4Delphi, which makes it simple to create Python GUIs for Windows using Delphi.

Also, look into DelphiVCL, which makes it simple to create Windows GUIs with Python.


References & further readings

[1] All sentences and translations are from Tatoeba’s (tatoeba.org) massive and awesome dataset, released under a CC-BY License.

[2] Biswal, A. (2023).

Top 10 Deep Learning Algorithms You Should Know in 2023. Simplilearn. simplilearn.com/tutorials/deep-learning-tutorial/deep-learning-algorithm

[3] Cijov, A. (2021).

Bilingual Sentence Pair: Dataset for Translator Projects. Kaggle. kaggle.com/datasets/alincijov/bilingual-sentence-pairs

[4] Hochreiter, S., & Schmidhuber, J. (1997).

Long short-term memory. Neural computation, 9(8), 1735-1780.

[5] Jain, H. (2021).

Machine Translation | Seq2Seq | LSTMs. Kaggle. kaggle.com/code/harshjain123/machine-translation-seq2seq-lstms

[6] Kumar, V. (2020).

Sequence-to-Sequence Modeling using LSTM for Language Translation. Analytics India Magazine. analyticsindiamag.com/sequence-to-sequence-modeling-using-lstm-for-language-translation

[7] Malik, U. (2022).

Python for NLP: Neural Machine Translation with Seq2Seq in Keras. StackAbuse. stackabuse.com/python-for-nlp-neural-machine-translation-with-seq2seq-in-keras

[8] Pennington, J., Socher, R., & Manning, C. D. (2014, October).

Glove: Global vectors for word representation. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP) (pp. 1532-1543). nlp.stanford.edu/projects/glove

[9] Schmidhuber, J. (2022).

2022: 25th anniversary of 1997 papers: Long Short-Term Memory. All computable metaverses. Hierarchical reinforcement learning (RL). Meta-RL. Abstractions in generative adversarial RL. Soccer learning. Low-complexity neural nets. Low-complexity art. Others. AI Blog. IDSIA, Lugano, Switzerland.

[10] Wu, Y., Schuster, M., Chen, Z., Le, Q. V., Norouzi, M., Macherey, W., … & Dean, J. (2016).

Google’s neural machine translation system: Bridging the gap between human and machine translation. arXiv preprint arXiv:1609.08144.

Related posts
CodeIDEProjectsPythonWindows

Unlock the Power of Python for Deep Learning with Diffusion Model - The Engine behind Stable Diffusion

CodeIDELearn PythonPythonPython GUITkinter

How To Make More Than 20 ChatGPT Prompts Work With Python GUI Builders And OpenCV Library?

CodeIDEProjectsPythonWindows

Unlock the Power of Python for Deep Learning with Radial Basis Function Networks (RBFNs)

CodeIDELearn PythonPythonPython GUITkinter

How To Make More Than 20 ChatGPT Prompts Work With Python GUI Builders And NumPy Library?

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.