Unlock the Power of Python for Deep Learning with Long-Short-Term Memory Networks

In order to solve complicated issues, deep learning algorithms need enormous volumes of data and computational power. They can operate with nearly any form of data. The Long-Short-Term Memory Networks (LSTM), one of the most well-known deep learning techniques, will now be examined in-depth, in this article.

Table of Contents

What is Deep Learning?

Deep learning, a branch of machine learning, addresses intricate problems through the utilization of artificial neural networks. These networks consist of interconnected nodes organized in multiple layers, extracting features from input data. Extensive datasets are employed to train these models, enabling them to identify patterns and correlations that might be challenging or impossible for humans to perceive.

The impact of deep learning on artificial intelligence has been substantial. It has paved the way for the development of intelligent systems capable of independent learning, adaptation, and decision-making. Deep learning has led to remarkable advancements in various domains, encompassing image and speech recognition, natural language processing, machine translation, autonomous driving, and numerous others.

Unlock the Power of Python for Deep Learning with Long-Short-Term Memory Networks - diagram — An example of LSTM’s advanced architecture: GNMT (Google’s Neural Machine Translation) system. Read more on Reference [10].

Why Use Python for Deep Learning, Machine Learning, and Artificial Intelligence?

Python has gained widespread popularity as a programming language due to its versatility and ease of use in diverse domains of computer science, especially in the field of deep learning, machine learning, and AI.

We’ve reviewed several times about why Python is great for Deep Learning, Machine Learning, and Artificial Intelligence (also all the prerequisites), in the following articles:

Unlock the Power of Python for Deep Learning with Convolutional Neural Networks

Machine Learning: 5 Ways To Use ML in your Windows Apps

Learn To Build A GUI For These 10 Ultimate Python AI Libraries

What is a Long-Short-Term Memory Network (LSTM)?

Long Short-Term Memory (LSTM) networks are a modified version of recurrent neural networks, which makes it easier to remember past data in memory. LSTM is introduced to solve the performance degradation of RNNs in long-term sequences (vanishing gradient problem). Read more about it from Hochreiter, S., & Schmidhuber, J. paper (reference [4]).

LSTM is well-suited to classify, process, and predict time series given time lags of unknown duration. It trains the model by using back-propagation. In an LSTM network, three gates are present: The input gate, forget gate, and the output gate.

Unlock the Power of Python for Deep Learning with Long-Short-Term Memory Networks - LTSM architecture diagram — LSTM architecture. Image source: Reference [2].

I’ve talked a little bit about LSTM as an advanced RNN architectures, as well as GRU, in the previous deep learning article:

Unlock the Power of Python for Deep Learning with Recurrent Neural Networks

How does LSTM work for Machine Translation?

Sequence to Sequence modeling is one of the many intriguing uses of natural language processing. Both language translation systems and question-answering systems make extensive use of it.

The goal of sequence-to-sequence (Seq2Seq) modeling is to develop models that can convert sequences from one domain to another, such as translating English to German. The LSTM encoder and decoder execute this Seq2Seq modeling [7].

Unlock the Power of Python for Deep Learning with Long-Short-Term Memory Networks - diagram 2

Here’s how it works:

Feed the embedding vectors for source sequences (German), to the encoder network, one word at a time.
Encode the input sentences into fixed-dimension state vectors. At this step, we get the hidden and cell states from the encoder LSTM and feed it to the decoder LSTM.
These states are regarded as initial states by the decoder. Additionally, it also has embedding vectors for target words (English).
Decode and output the translated sentence, one word at a time. In this step, the output of the decoder is sent to a softmax layer over the entire target vocabulary.

A typical seq2seq model has 2 major components: An encoder, and a decoder. Both these parts are essentially two different recurrent neural network (RNN) models combined into one giant network:

Unlock the Power of Python for Deep Learning with Long-Short-Term Memory Networks - diagram 3

How do I build and train an LSTM for Machine Translation from scratch?

Let’s get hands-on with some Python code to build and train your own LSTMs from scratch.

In this article, we will create a language translation model using seq2seq architecture and LSTM network, as it is a very famous application of neural machine translation (including Google Translate). Brace yourself, this article is a little bit more intense, compared to all my previous tutorials.

We will work with the Kaggle Bilingual Sentence Pairs dataset (reference [3]) to train the LSTM, so it can predict the unseen data, or even more, perform machine translation. The original source of the dataset is the Tatoeba Project (to download the dataset, see Reference [1 & 3]).

The actual data contains over 150,000 sentence pairs. However, we will use only the first 50,000 sentence pairs in the 1st demo, and the first 20,000 sentence pairs in the 2nd demo to reduce the training time of the model, (of course, this will lead to not-really-satisfying results, but this article will still serve its purpose as proof-of-concept). You can increase this number if you are equipped with a powerful computer.

Hands-on and selected outputs (1st example)

This example is modified from reference [5]. The original reference already shows excellent results when predicting the unseen data in English. However, I modified the code slightly, so we can test it to predict the unseen data in German.

The following is the code example of implementing LSTM for machine translation:

100

101

102

103

104

105

106

107

108

109

110

111

112

113

114

115

116

117

118

119

120

121

122

123

124

125

126

127

128

129

130

131

132

133

134

135

136

137

138

139

140

141

142

143

144

145

146

147

148

149

150

151

152

153

154

155

156

157

158

159

160

161

162

163

164

165

166

167

168

169

170

171

172

173

174

175

176

177

178

179

180

181

182

183

184

185

186

187

188

189

190

191

192

193

194

195

196

197

198

199

200

201

202

203

204

# Import libraries

import string

import re

from numpy import array, argmax, random, take

import pandas as pd

from keras.models import Sequential

from keras.layers import Dense, LSTM, Embedding, RepeatVector

from keras.preprocessing.text import Tokenizer

from keras.callbacks import ModelCheckpoint

from keras.preprocessing.sequence import pad_sequences

from keras.models import load_model

from keras import optimizers

import matplotlib.pyplot as plt

pd.set_option('display.max_colwidth', 200)

# Function to read raw text file

def read_text(filename):

# open the file

file = open(filename, mode='rt', encoding='utf-8')

# read all text

text = file.read()

file.close()

return text

# Split a text into sentences

def to_lines(text):

sents = text.strip().split('n')

sents = [i.split('t') for i in sents]

return sents

# Load dataset

data = read_text("data/bilingual-sentence-pairs/deu.txt")

deu_eng = to_lines(data)

deu_eng = array(deu_eng)

## We will use only the first 50,000 sentence pairs to reduce the training time of the model

deu_eng = deu_eng[:50000,:]

# Remove punctuation

deu_eng[:,0] = [s.translate(str.maketrans('', '', string.punctuation)) for s in deu_eng[:,0]]

deu_eng[:,1] = [s.translate(str.maketrans('', '', string.punctuation)) for s in deu_eng[:,1]]

deu_eng

# Convert text to lowercase

for i in range(len(deu_eng)):

deu_eng[i,0] = deu_eng[i,0].lower()

deu_eng[i,1] = deu_eng[i,1].lower()

# Empty lists

eng_l = []

deu_l = []

# Populate the lists with sentence lengths

for i in deu_eng[:,0]:

eng_l.append(len(i.split()))

for i in deu_eng[:,1]:

deu_l.append(len(i.split()))

## Plot the distributions

import pylab as pl

length_df = pd.DataFrame({'eng':eng_l, 'deu':deu_l})

length_df.hist(bins = 30)

pl.suptitle("Distributions of sentence lengths (eng vs deu)")

plt.show()

## Find the max sentence length for each language

max_eng_sentence_length = max(length_df['eng'])

max_deu_sentence_length = max(length_df['deu'])

print('Max sentence length for eng: %d' % max_eng_sentence_length)

print('Max sentence length for deu: %d' % max_deu_sentence_length)

# Function to build a tokenizer

def tokenization(lines):

tokenizer = Tokenizer()

tokenizer.fit_on_texts(lines)

return tokenizer

# Prepare english tokenizer

eng_tokenizer = tokenization(deu_eng[:, 0])

eng_vocab_size = len(eng_tokenizer.word_index) + 1

## Choose "7" as the max sentence length

eng_length = 7

print('English Vocabulary Size: %d' % eng_vocab_size)

# Prepare Deutch tokenizer

deu_tokenizer = tokenization(deu_eng[:, 1])

deu_vocab_size = len(deu_tokenizer.word_index) + 1

## Choose "7" as the max sentence length

deu_length = 7

print('Deutch Vocabulary Size: %d' % deu_vocab_size)

# Encode and pad sequences

def encode_sequences(tokenizer, length, lines):

seq = tokenizer.texts_to_sequences(lines)

## Pad sequences with 0 values

seq = pad_sequences(seq, maxlen=length, padding='post')

return seq

# Model building

from sklearn.model_selection import train_test_split

## Split data into train and test set

train, test = train_test_split(deu_eng, test_size=0.2, random_state = 12)

# Prepare training data

trainX = encode_sequences(deu_tokenizer, deu_length, train[:, 1])

trainY = encode_sequences(eng_tokenizer, eng_length, train[:, 0])

# Prepare validation data

testX = encode_sequences(deu_tokenizer, deu_length, test[:, 1])

testY = encode_sequences(eng_tokenizer, eng_length, test[:, 0])

# Define the model

## Build NMT model

def define_model(in_vocab,out_vocab, in_timesteps,out_timesteps,units):

model = Sequential()

model.add(Embedding(in_vocab, units, input_length=in_timesteps, mask_zero=True))

model.add(LSTM(units))

model.add(RepeatVector(out_timesteps))

model.add(LSTM(units, return_sequences=True))

model.add(Dense(out_vocab, activation='softmax'))

return model

## Model compilation

model = define_model(deu_vocab_size, eng_vocab_size, deu_length, eng_length, 512)

rms = optimizers.RMSprop(lr=0.001)

model.compile(optimizer=rms, loss='sparse_categorical_crossentropy')

# Fit the model

## Save the model with the lowest validation loss

filename = 'model.h1.28_may_23'

checkpoint = ModelCheckpoint(filename, monitor='val_loss', verbose=1, save_best_only=True, mode='min')

# Train model

history = model.fit(trainX, trainY.reshape(trainY.shape[0], trainY.shape[1], 1),

epochs=30, batch_size=512, validation_split = 0.2,callbacks=[checkpoint],

verbose=1)

## Plot validation loss

plt.plot(history.history['loss'])

plt.plot(history.history['val_loss'])

plt.legend(['train','validation'])

plt.show()

# Prediction on unseen data

from keras.models import load_model

model = load_model('model.h1.28_may_23')

preds = model.predict_classes(testX.reshape((testX.shape[0],testX.shape[1])))

# Present the results in dataframe (eng)

def get_word(n, tokenizer):

for word, index in tokenizer.word_index.items():

if index == n:

return word

return None

preds_text_eng = []

for i in preds:

temp = []

for j in range(len(i)):

t = get_word(i[j], eng_tokenizer)

if j > 0:

if (t == get_word(i[j-1], eng_tokenizer)) or (t == None):

temp.append('')

else:

temp.append(t)

else:

if(t == None):

temp.append('')

else:

temp.append(t)

preds_text_eng.append(' '.join(temp))

pred_df_eng = pd.DataFrame({'actual' : test[:,0], 'predicted' : preds_text_eng})

## Print 15 rows randomly

print(pred_df_eng.head(15))

# Present the results in dataframe (deu)

def get_word(n, tokenizer):

for word, index in tokenizer.word_index.items():

if index == n:

return word

return None

preds_text_deu = []

for i in preds:

temp = []

for j in range(len(i)):

t = get_word(i[j], deu_tokenizer)

if j > 0:

if (t == get_word(i[j-1], deu_tokenizer)) or (t == None):

temp.append('')

else:

temp.append(t)

else:

if(t == None):

temp.append('')

else:

temp.append(t)

preds_text_deu.append(' '.join(temp))

pred_df_deu = pd.DataFrame({'actual' : test[:,1], 'predicted' : preds_text_deu})

## Print 15 rows randomly

print(pred_df_deu.head(15))

To execute the code provided, we can utilize the PyScripter IDE. Here are a few selected outputs:

1. Visualizing the distribution of sentence lengths (eng vs deu)

We will generate a plot to illustrate the distribution of sentence lengths. For this purpose, we will store the lengths of all English sentences in one list and the lengths of all German sentences in another.

Unlock the Power of Python for Deep Learning with Long-Short-Term Memory Networks - visualizing

2. Maximum sentence length and vocabulary size (eng vs deu)

It is quite intuitive that the maximum length of German sentences is 15, whereas for English phrases, it is 7.

To facilitate the utilization of a Seq2Seq model, it is necessary to convert both input and output sentences into fixed-length integer sequences. To achieve this, we employ the Tokenizer() class from Keras, which transforms our sentences into sequences of integers. These sequences are then padded with zeros to ensure uniform length across all sequences.

In order to prepare for this process, we create tokenizers for both German and English sentences. At the same time, we also counted the vocabulary size for both languages and printed them out as can be seen above.

3. Model training and saving the best result

For the training process, we will run the model for 30 epochs, utilizing a batch_size of 512 and a validation_split of 20%. This means that 80% of the data will be allocated for training the model, while the remaining 20% will be used for evaluation. Feel free to experiment and modify these hyperparameters to suit your needs.

To ensure that we capture the model’s best performance, we will employ the ModelCheckpoint() function, which saves the model with the lowest validation loss (val_loss). The resulting model with the best performance will be automatically stored in the “model.h1.28_may_23” folder.

Unlock the Power of Python for Deep Learning with Long-Short-Term Memory Networks - code 1

Here is the validation loss score for each epoch:

Epoch-01:

Epoch-02:

Epoch-03:

Epoch-04:

Epoch-05:

Epoch-06:

Epoch-07:

Epoch-08:

Epoch-09:

Epoch-10:

Epoch-11:

Epoch-12:

Epoch-13:

Epoch-14:

Epoch-15:

Epoch-16:

Epoch-17:

Epoch-18:

Epoch-19:

Epoch-20:

Epoch-21:

Epoch-22:

Epoch-23:

Epoch-24:

Epoch-25:

Epoch-26:

Epoch-27:

Epoch-28:

Epoch-29:

Epoch-30:

The model training processes are executed flawlessly within the PyScripter IDE, ensuring a smooth and error-free experience without any delays or disruptions.

4. Plot train loss vs validation loss

Let’s plot and compare the training loss and the validation loss.

Unlock the Power of Python for Deep Learning with Long-Short-Term Memory Networks - plotting the loss

As you can see in the plot above, the validation loss plateaus after the 20th epoch, indicating that the model has likely converged and will not improve further with additional training.

5. Generating predictions for unseen data

The generated predictions consist of sequences represented by integers. To make these predictions more understandable, we must convert these integers back into their respective words.

Once the conversion is complete and the original sentences are placed in the test dataset while the predicted sentences are stored in a data frame, we can randomly display some instances of actual sentences compared to their corresponding predicted sentences. This allows us to assess the performance of our model:

Unlock the Power of Python for Deep Learning with Long-Short-Term Memory Networks - a code example

Prediction results (eng):

Unlock the Power of Python for Deep Learning with Long-Short-Term Memory Networks - prediction results

For the English results, the model is doing a pretty decent job.

Let’s do similar things for Deutsch, and here are the prediction results (deu):

Unlock the Power of Python for Deep Learning with Long-Short-Term Memory Networks - Deutsch

Unfortunately, the predictions for Deutsch are still unsatisfactory, and in some cases, completely incorrect. For instance, the model failed to identify that “Boston” refers to a city and “Tom” is a person’s name. To identify these errors, you can cross-reference them using Google Translator.

Eventually, after enough training epochs, using more training data and building a better (or more complex) model (if you have enough computational power to do so), the results will gradually improve over time. These are the challenges we will face regularly in NLP. But these aren’t immovable obstacles.

This is how you would use LSTM to solve a sequence prediction task. Let’s try another scenario of implementation, in the next subsection.

Hands-on and selected outputs (2nd example)

This second implementation scenario refers to Reference [7]. But, that excellent blog post is implementing the neural machine translation from English to French. In this article, we will try English to Deutsch, instead.

In this 2nd example of LSTM implementation for machine translation, we still use the same dataset, but, for the word embedding, we will utilize the GloVe (Global Vectors for Word Representation) word embeddings (see reference [8]).

The following is the second example of implementing LSTM for machine translation:

100

101

102

103

104

105

106

107

108

109

110

111

112

113

114

115

116

117

118

119

120

121

122

123

124

125

126

127

128

129

130

131

132

133

134

135

136

137

138

139

140

141

142

143

144

145

146

147

148

149

150

151

152

153

154

155

156

157

158

159

160

161

162

163

164

165

166

167

168

169

170

171

172

173

174

175

176

177

178

179

180

181

182

183

184

185

186

187

188

189

190

191

192

193

194

195

196

197

198

199

200

201

202

203

204

205

206

207

208

209

210

211

212

213

214

215

216

217

218

219

220

221

222

223

224

225

226

227

228

229

230

231

232

233

234

235

236

237

238

239

240

241

242

# Import libraries

import os

import sys

from keras.models import Model

from keras.layers import Input, LSTM, GRU, Dense, Embedding

from keras.preprocessing.text import Tokenizer

from keras.preprocessing.sequence import pad_sequences

from keras.utils import to_categorical

import numpy as np

import matplotlib.pyplot as plt

from numpy import array # For loading GloVe data

from numpy import asarray # For loading GloVe data

from numpy import zeros # For loading GloVe data

from keras.utils import plot_model # For plotting DL models

# Set values for different parameters

BATCH_SIZE = 64

EPOCHS = 30

LSTM_NODES = 256

NUM_SENTENCES = 20000

MAX_SENTENCE_LENGTH = 50

MAX_NUM_WORDS = 20000

EMBEDDING_SIZE = 100

# Data preprocessing

input_sentences = []

output_sentences = []

output_sentences_inputs = []

count = 0

for line in open("data/bilingual-sentence-pairs/deu.txt", encoding="utf-8"):

count += 1

if count > NUM_SENTENCES:

break

if 't' not in line:

continue

input_sentence, output, _ = line.rstrip().split('t')

output_sentence = output + ' <eos>'

output_sentence_input = '<sos> ' + output

input_sentences.append(input_sentence)

output_sentences.append(output_sentence)

output_sentences_inputs.append(output_sentence_input)

print("num samples input:", len(input_sentences))

print("num samples output:", len(output_sentences))

print("num samples output input:", len(output_sentences_inputs))

# Randomly print sentences

print(input_sentences[172])

print(output_sentences[172])

print(output_sentences_inputs[172])

# Tokenization (for inputs)

input_tokenizer = Tokenizer(num_words=MAX_NUM_WORDS)

input_tokenizer.fit_on_texts(input_sentences)

input_integer_seq = input_tokenizer.texts_to_sequences(input_sentences)

word2idx_inputs = input_tokenizer.word_index

print('Total unique words in the input: %s' % len(word2idx_inputs))

max_input_len = max(len(sen) for sen in input_integer_seq)

print("Length of longest sentence in input: %g" % max_input_len)

# Tokenization (for outputs)

output_tokenizer = Tokenizer(num_words=MAX_NUM_WORDS, filters='')

output_tokenizer.fit_on_texts(output_sentences + output_sentences_inputs)

output_integer_seq = output_tokenizer.texts_to_sequences(output_sentences)

output_input_integer_seq = output_tokenizer.texts_to_sequences(output_sentences_inputs)

word2idx_outputs = output_tokenizer.word_index

print('Total unique words in the output: %s' % len(word2idx_outputs))

num_words_output = len(word2idx_outputs) + 1

max_out_len = max(len(sen) for sen in output_integer_seq)

print("Length of longest sentence in the output: %g" % max_out_len)

# Padding

encoder_input_sequences = pad_sequences(input_integer_seq, maxlen=max_input_len)

print("encoder_input_sequences.shape:", encoder_input_sequences.shape)

print("encoder_input_sequences[172]:", encoder_input_sequences[172])

## Verify the integer values for "go" and "away" (sentence index 172)

print(word2idx_inputs["go"])

print(word2idx_inputs["away"])

## In the same way, padd the decoder outputs and the decoder inputs (deu):

decoder_input_sequences = pad_sequences(output_input_integer_seq, maxlen=max_out_len, padding='post')

print("decoder_input_sequences.shape:", decoder_input_sequences.shape)

print("decoder_input_sequences[172]:", decoder_input_sequences[172])

### Print the corresponding integers from the word2idx_outputs (sentence index 172)

print(word2idx_outputs["<sos>"])

print(word2idx_outputs["mach"])

print(word2idx_outputs["’ne"])

print(word2idx_outputs["fliege!"])

# Create word embeddings for the inputs by load the GloVe word vectors into memory

embeddings_dictionary = dict()

glove_file = open("data/glove/glove.6B.100d.txt", encoding="utf-8")

for line in glove_file:

records = line.split()

word = records[0]

vector_dimensions = asarray(records[1:], dtype='float32')

embeddings_dictionary[word] = vector_dimensions

glove_file.close()

## Create a matrix where the row number will represent the integer value for the word and the columns will correspond to the dimensions of the word

num_words = min(MAX_NUM_WORDS, len(word2idx_inputs) + 1)

embedding_matrix = zeros((num_words, EMBEDDING_SIZE))

for word, index in word2idx_inputs.items():

embedding_vector = embeddings_dictionary.get(word)

if embedding_vector is not None:

embedding_matrix[index] = embedding_vector

## Print the word embeddings for the word "go" using the GloVe word embedding dictionary.

print(embeddings_dictionary["go"])

print(embedding_matrix[20])

## Creates the embedding layer for the input

embedding_layer = Embedding(num_words, EMBEDDING_SIZE, weights=[embedding_matrix], input_length=max_input_len)

# Create the model

## The final shape of the output: (number of inputs, length of the output sentence, the number of words in the output)

## Creates the empty output array:

decoder_output_sequences = [] # Define decoder_output_sequences variable

for seq in output_integer_seq:

decoder_output_sequences.append(seq[1:]) # Remove the first element "<sos>"

decoder_targets_one_hot = np.zeros((

len(input_sentences),

max_out_len,

num_words_output

), dtype='float32')

## Prints the shape of the decoder:

print(decoder_targets_one_hot.shape)

## To make predictions, the final layer of the model will be a dense layer, therefore we need the outputs in the form of one-hot encoded vectors.

for i, d in enumerate(decoder_output_sequences):

for t, word in enumerate(d):

decoder_targets_one_hot[i, t, word] = 1

## Create the encoder for LSTM:

encoder_inputs_placeholder = Input(shape=(max_input_len,))

x = embedding_layer(encoder_inputs_placeholder)

encoder = LSTM(LSTM_NODES, return_state=True)

encoder_outputs, h, c = encoder(x)

encoder_states = [h, c]

## Create the decoder for LSTM:

decoder_inputs_placeholder = Input(shape=(max_out_len,))

decoder_embedding = Embedding(num_words_output, LSTM_NODES)

decoder_inputs_x = decoder_embedding(decoder_inputs_placeholder)

decoder_lstm = LSTM(LSTM_NODES, return_sequences=True, return_state=True)

decoder_outputs, _, _ = decoder_lstm(decoder_inputs_x, initial_state=encoder_states)

## Pass the output from the decoder LSTM through a dense layer, to predict decoder outputs

decoder_dense = Dense(num_words_output, activation='softmax')

decoder_outputs = decoder_dense(decoder_outputs)

# Compile the model

model = Model([encoder_inputs_placeholder,

decoder_inputs_placeholder], decoder_outputs)

model.compile(

optimizer='rmsprop',

loss='categorical_crossentropy',

metrics=['accuracy']

)

## Plot our model

plot_model(model, to_file='plot_LSTMModelForMachineTranslation.png', show_shapes=True, show_layer_names=True)

# Train the model using the fit() method:

r = model.fit(

[encoder_input_sequences, decoder_input_sequences],

decoder_targets_one_hot,

batch_size=BATCH_SIZE,

epochs=EPOCHS,

validation_split=0.1,

)

# Modifying the model for predictions

## The encoder model remains the same:

encoder_model = Model(encoder_inputs_placeholder, encoder_states)

## Modify our model to accept the hidden and cell states

decoder_state_input_h = Input(shape=(LSTM_NODES,))

decoder_state_input_c = Input(shape=(LSTM_NODES,))

decoder_states_inputs = [decoder_state_input_h, decoder_state_input_c]

## At each time step, there will be only single word in the decoder input, we need to modify the decoder embedding layer as follows:

decoder_inputs_single = Input(shape=(1,))

decoder_inputs_single_x = decoder_embedding(decoder_inputs_single)

## Create the placeholder for decoder outputs:

decoder_outputs, h, c = decoder_lstm(decoder_inputs_single_x, initial_state=decoder_states_inputs)

## To make predictions, the decoder output is passed through the dense layer:

decoder_states = [h, c]

decoder_outputs = decoder_dense(decoder_outputs)

## The final step is to define the updated decoder model, as shown here:

decoder_model = Model(

[decoder_inputs_single] + decoder_states_inputs,

[decoder_outputs] + decoder_states

)

## Plot our modified decoder LSTM that makes predictions:

plot_model(decoder_model, to_file='plot_modifiedLSTMModelForMachineTranslation.png', show_shapes=True, show_layer_names=True)

# Making predictions

## Create new dictionaries for both inputs and outputs where the keys will be the integers and the corresponding values will be the words:

idx2word_input = {v:k for k, v in word2idx_inputs.items()}

idx2word_target = {v:k for k, v in word2idx_outputs.items()}

## Create translate_sentence() method to accept an input-padded sequence English sentence (in the integer form) and will return the translated French sentence.

def translate_sentence(input_seq):

states_value = encoder_model.predict(input_seq)

target_seq = np.zeros((1, 1))

target_seq[0, 0] = word2idx_outputs['<sos>']

eos = word2idx_outputs['<eos>']

output_sentence = []

for _ in range(max_out_len):

output_tokens, h, c = decoder_model.predict([target_seq] + states_value)

idx = np.argmax(output_tokens[0, 0, :])

if eos == idx:

break

word = ''

if idx > 0:

word = idx2word_target[idx]

output_sentence.append(word)

target_seq[0, 0] = idx

states_value = [h, c]

return ' '.join(output_sentence)

# Testing the model

i = np.random.choice(len(input_sentences))

input_seq = encoder_input_sequences[i:i+1]

translation = translate_sentence(input_seq)

print('-')

print('Input:', input_sentences[i])

print('Response:', translation)

Let’s run the above code using PyScripter IDE. And the following are some selected outputs:

1. Word embeddings

This part shows the implementation of word embeddings for neural machine translation.Here is the printed result of the word embeddings for the word “go” using the GloVe word embedding dictionary.

Unlock the Power of Python for Deep Learning with Long-Short-Term Memory Networks - word embeddings

check the 20th index of the word embedding matrix (the word “go”), and its shows a consistent result:

Unlock the Power of Python for Deep Learning with Long-Short-Term Memory Networks - epoch an embedded matrix

2. Plot our LSTM model for machine translation

Another interesting part of this second approach is, after we compile the model, we can plot it using tf.keras.utils.plot_model. So, we can keep tracking and communicate about all the inputs, outputs, steps, layers, etc clearly.

Unlock the Power of Python for Deep Learning with Long-Short-Term Memory Networks - keras model

3. Plot the modified model for prediction

Before making any predictions, first, we need to modify our model.

The following is the plot of our model after some modification performed to make predictions:

Unlock the Power of Python for Deep Learning with Long-Short-Term Memory Networks - plot modified model for prediction

4. Train the model

I trained the model in 30 epochs, you can modify the number of epochs to see if you can get better results. The model is trained on 18,000 sentences and tested on the remaining 2,000 sentences. You can also add the number of records if you want to get better results, and if you have capable computational resources.

5. Test the model

To test the code, we will randomly choose a sentence from the input_sentences list, retrieve the corresponding padded sequence for the sentence, and will pass it to the translate_sentence() method. The method will return the translated sentence as shown below.

1st attempt of the test:

2nd attempt:

3rd attempt:

Again, it seems that the results in Deutsch are still far from satisfying. To identify these errors, you can cross-reference them using Google Translator.

Eventually, after enough training epochs, using more training data and building a better (or more complex) model (if you have enough computational power to do so), will give better and better results over time. These are the challenges we will face regularly in NLP.

Endnotes

LSTMs are a very promising solution to sequence-related problems. It has tons of very useful implementations out there, from time series prediction, weather forecasting, machine translation, speech recognition, and many more. According to Google Scholar, no other computer science paper of the 20th century is receiving as many citations per year as the original 1997 journal publication on Long Short-Term Memory (LSTM) artificial neural networks (NNs) [9].

However, the one disadvantage that I find about them is the difficulty in training them. A lot of data, training epochs/time, and system resources are needed to go into training even a simple model. But that is just a hardware constraint, and PyScripter IDE handles it very well, lightweight, with zero error or lag!

I hope this article was successful in giving you a basic understanding and workflow of how these networks work.

Check out the full repository here:

github.com/Embarcadero/DL_Python03_LSTM

Click here to get started with PyScripter, a free, feature-rich, and lightweight Python IDE.

Download RAD Studio to create more powerful Python GUI Windows Apps in 5x less time.

Check out Python4Delphi, which makes it simple to create Python GUIs for Windows using Delphi.

Also, look into DelphiVCL, which makes it simple to create Windows GUIs with Python.

References & further readings

[1] All sentences and translations are from Tatoeba’s (tatoeba.org) massive and awesome dataset, released under a CC-BY License.

[2] Biswal, A. (2023).

Top 10 Deep Learning Algorithms You Should Know in 2023. Simplilearn. simplilearn.com/tutorials/deep-learning-tutorial/deep-learning-algorithm

[3] Cijov, A. (2021).

Bilingual Sentence Pair: Dataset for Translator Projects. Kaggle. kaggle.com/datasets/alincijov/bilingual-sentence-pairs

[4] Hochreiter, S., & Schmidhuber, J. (1997).

Long short-term memory. Neural computation, 9(8), 1735-1780.

[5] Jain, H. (2021).

Machine Translation | Seq2Seq | LSTMs. Kaggle. kaggle.com/code/harshjain123/machine-translation-seq2seq-lstms

[6] Kumar, V. (2020).

Sequence-to-Sequence Modeling using LSTM for Language Translation. Analytics India Magazine. analyticsindiamag.com/sequence-to-sequence-modeling-using-lstm-for-language-translation

[7] Malik, U. (2022).

Python for NLP: Neural Machine Translation with Seq2Seq in Keras. StackAbuse. stackabuse.com/python-for-nlp-neural-machine-translation-with-seq2seq-in-keras

[8] Pennington, J., Socher, R., & Manning, C. D. (2014, October).

Glove: Global vectors for word representation. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP) (pp. 1532-1543). nlp.stanford.edu/projects/glove

[9] Schmidhuber, J. (2022).

2022: 25th anniversary of 1997 papers: Long Short-Term Memory. All computable metaverses. Hierarchical reinforcement learning (RL). Meta-RL. Abstractions in generative adversarial RL. Soccer learning. Low-complexity neural nets. Low-complexity art. Others. AI Blog. IDSIA, Lugano, Switzerland.

[10] Wu, Y., Schuster, M., Chen, Z., Le, Q. V., Norouzi, M., Macherey, W., … & Dean, J. (2016).

Google’s neural machine translation system: Bridging the gap between human and machine translation. arXiv preprint arXiv:1609.08144.

Unlock the Power of Python for Deep Learning with Long-Short-Term Memory Networks

What is Deep Learning?

Why Use Python for Deep Learning, Machine Learning, and Artificial Intelligence?

What is a Long-Short-Term Memory Network (LSTM)?

How does LSTM work for Machine Translation?

How do I build and train an LSTM for Machine Translation from scratch?

Hands-on and selected outputs (1st example)

1. Visualizing the distribution of sentence lengths (eng vs deu)

2. Maximum sentence length and vocabulary size (eng vs deu)

3. Model training and saving the best result

Here is the validation loss score for each epoch:

4. Plot train loss vs validation loss

5. Generating predictions for unseen data

Hands-on and selected outputs (2nd example)

1. Word embeddings

2. Plot our LSTM model for machine translation

3. Plot the modified model for prediction

4. Train the model

5. Test the model

Endnotes

References & further readings

[1] All sentences and translations are from Tatoeba’s (tatoeba.org) massive and awesome dataset, released under a CC-BY License.

[2] Biswal, A. (2023).

[3] Cijov, A. (2021).

[4] Hochreiter, S., & Schmidhuber, J. (1997).

[5] Jain, H. (2021).

[6] Kumar, V. (2020).

[7] Malik, U. (2022).

[8] Pennington, J., Socher, R., & Manning, C. D. (2014, October).

[9] Schmidhuber, J. (2022).

[10] Wu, Y., Schuster, M., Chen, Z., Le, Q. V., Norouzi, M., Macherey, W., … & Dean, J. (2016).

Related posts

Leave a Reply Cancel reply

Something Fresh

What People Reading

Categories

Python GUI

Categories

Useful Links

Follow us