Do you want to perform Natural Language Processing tasks like predicting text, analyzing & visualizing sentence structure, Sentiment Analysis, gender classification, etc. in your GUI app? This post will get you to understand how to use NLTK Python Library using Python4Delphi (P4D) in the Delphi/C++ Builder application and perform some interesting NLP tasks.
NLTK is a leading platform for building Python programs to work with human language data. NLP for short — in a wide sense to cover any kind of computer manipulation of natural language. It provides easy-to-use interfaces to over 50 corpora and lexical resources such as WordNet, along with a suite of text processing libraries for classification, tokenization, stemming, tagging, parsing, and semantic reasoning, wrappers for industrial-strength NLP libraries, and an active discussion forum.
Table of Contents
Practical work in Natural Language Processing typically uses large bodies of linguistic data or corpora. Let’s install the popular NLTK datasets using this command:
python -m nltk.downloader popular
2. Hands On
This post will guide you on how to perform Natural Language Processing tasks via Python’s NLTK and then display it in the Delphi Windows GUI app.
First, open and run our Python GUI using project Demo1 from Python4Delphi with RAD Studio. Then insert the script into the lower Memo, click the Execute button, and get the result in the upper Memo. You can find the Demo1 source on GitHub. The behind the scene details of how Delphi manages to run your Python code in this amazing Python GUI can be found at this link.
With the NLTK library, we will perform interesting tasks like Analyzing Sentence Structure and Gender Identification by name. This is the example of analyzing sentence structure or grammar using NLTK data from Treebank corpus:
from nltk.corpus import treebank
# Display a parse tree from corpus treebank
t = treebank.parsed_sents('wsj_0009.mrg')
t.draw() # opens a new window.
The Treebank corpora provides a syntactic parse for each sentence. The NLTK data package includes a 10% sample of the Penn Treebank (in treebank), as well as the Sinica Treebank (in sinica_treebank). In this example, we use the Wall Street Journal sample number 9.
For the next example, we will create a classifier app that could predict gender from the people’s name as input.
Classification is the task of choosing the correct class label for a given input. In basic classification tasks, each input is considered in isolation from all other inputs, and the set of labels is defined in advance. Some examples of classification tasks are:
- Deciding whether an email is a spam or not.
- Deciding what the topic of a news article is, from a fixed list of topic areas such as “sports, ” “technology, ” and “politics.”
- Classify sentiments in the news or social media, etc.
Let’s build our own classifier using the following code, we use Naive Bayes classifier as our classification algorithm:
# Importing libraries
from nltk.corpus import names
# Preparing a list of examples and corresponding class labels.
labeled_names = ([(name, 'male') for name in names.words('male.txt')]+
[(name, 'female') for name in names.words('female.txt')])
# We use the feature extractor to process the names data.
featuresets = [(gender_features(n), gender)
for (n, gender)in labeled_names]
# Divide the resulting list of feature sets into a training set and a test set.
train_set, test_set = featuresets[500:], featuresets[:500]
# The training set is used to train a new "naive Bayes" classifier.
classifier = nltk.NaiveBayesClassifier.train(train_set)
# Output should be 'male'
Congratulations, now you have learned how to perform Natural Language Processing tasks via Python’s NLTK and then display it in the Delphi Windows GUI app.
Check out the NLTK library for Python and use it in your projects: https://pypi.org/project/nltk/ and
Check out Python4Delphi which easily allows you to build Python GUIs for Windows using Delphi: https://github.com/pyscripter/python4delphi