Are you looking for an end-to-end open-source machine learning platform, and build a nice GUI for them? You can deliver enterprise-grade machine learning solutions easily by combining Scikit-Learn and Python4Delphi library, inside Delphi and C++Builder. And as a bonus, integrating Python with Delphi can speed up the execution time (read our other articles about TensorFlow and PyTorch).
Scikit-Learn
is an open-source Python library for machine learning. Scikit-Learn has simple and efficient tools for predictive data analysis that are built on top of SciPy
, NumPy
, and Matplotlib
. Scikit-Learn features various classification
, regression
, and clustering
algorithms including support vector machines
, random forests
, gradient boosting
, k-means
, and DBSCAN
.
The Scikit-Learn project (formerly scikits.learn
and also known as sklearn
) was started in 2007 by David Cournapeau as a Google Summer of Code project, and since then many volunteers have contributed. Scikit-Learn is distributed under the 3-Clause BSD license.
This post will guide you on how to run the Scikit-Learn library and use Python for Delphi to display it in the Delphi Windows GUI app.
First, open and run our Python GUI using project Demo1
from Python4Delphi with RAD Studio. Then insert the script into the lower Memo
, click the Execute script
button, and get the result in the upper Memo. You can find the Demo1
source on GitHub. The behind the scene details of how Delphi manages to run your Python code in this amazing Python GUI can be found at this link.
Let’s try two interesting Scikit-Learn demos in Python GUI: Classifier comparison and Color Quantization using K-Means clustering.
Table of Contents
Demo 1: Comparison between Classification Algorithms
This demo shows a performance comparison of several classifiers in scikit-learn
on synthetic datasets. The point of this example is to illustrate the nature of decision boundaries of different classifiers.
The following code for comparing classifiers is credited to Gaël Varoquaux, Andreas Müller, and Jaques Grobler (visit the source in Reference [4]).
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 |
import numpy as np import matplotlib.pyplot as plt from matplotlib.colors import ListedColormap from sklearn.model_selection import train_test_split from sklearn.preprocessing import StandardScaler from sklearn.datasets import make_moons, make_circles, make_classification from sklearn.neural_network import MLPClassifier from sklearn.neighbors import KNeighborsClassifier from sklearn.svm import SVC from sklearn.gaussian_process import GaussianProcessClassifier from sklearn.gaussian_process.kernels import RBF from sklearn.tree import DecisionTreeClassifier from sklearn.ensemble import RandomForestClassifier, AdaBoostClassifier from sklearn.naive_bayes import GaussianNB from sklearn.discriminant_analysis import QuadraticDiscriminantAnalysis h = .02 # step size in the mesh names = ["Nearest Neighbors", "Linear SVM", "RBF SVM", "Gaussian Process", "Decision Tree", "Random Forest", "Neural Net", "AdaBoost", "Naive Bayes", "QDA"] classifiers = [ KNeighborsClassifier(3), SVC(kernel="linear", C=0.025), SVC(gamma=2, C=1), GaussianProcessClassifier(1.0 * RBF(1.0)), DecisionTreeClassifier(max_depth=5), RandomForestClassifier(max_depth=5, n_estimators=10, max_features=1), MLPClassifier(alpha=1, max_iter=1000), AdaBoostClassifier(), GaussianNB(), QuadraticDiscriminantAnalysis()] X, y = make_classification(n_features=2, n_redundant=0, n_informative=2, random_state=1, n_clusters_per_class=1) rng = np.random.RandomState(2) X += 2 * rng.uniform(size=X.shape) linearly_separable = (X, y) datasets = [make_moons(noise=0.3, random_state=0), make_circles(noise=0.2, factor=0.5, random_state=1), linearly_separable ] figure = plt.figure(figsize=(27, 9)) i = 1 # iterate over datasets for ds_cnt, ds in enumerate(datasets): # preprocess dataset, split into training and test part X, y = ds X = StandardScaler().fit_transform(X) X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=.4, random_state=42) x_min, x_max = X[:, 0].min() - .5, X[:, 0].max() + .5 y_min, y_max = X[:, 1].min() - .5, X[:, 1].max() + .5 xx, yy = np.meshgrid(np.arange(x_min, x_max, h), np.arange(y_min, y_max, h)) # just plot the dataset first cm = plt.cm.RdBu cm_bright = ListedColormap(['#FF0000', '#0000FF']) ax = plt.subplot(len(datasets), len(classifiers) + 1, i) if ds_cnt == 0: ax.set_title("Input data") # Plot the training points ax.scatter(X_train[:, 0], X_train[:, 1], c=y_train, cmap=cm_bright, edgecolors='k') # Plot the testing points ax.scatter(X_test[:, 0], X_test[:, 1], c=y_test, cmap=cm_bright, alpha=0.6, edgecolors='k') ax.set_xlim(xx.min(), xx.max()) ax.set_ylim(yy.min(), yy.max()) ax.set_xticks(()) ax.set_yticks(()) i += 1 # iterate over classifiers for name, clf in zip(names, classifiers): ax = plt.subplot(len(datasets), len(classifiers) + 1, i) clf.fit(X_train, y_train) score = clf.score(X_test, y_test) # Plot the decision boundary. For that, we will assign a color to each # point in the mesh [x_min, x_max]x[y_min, y_max]. if hasattr(clf, "decision_function"): Z = clf.decision_function(np.c_[xx.ravel(), yy.ravel()]) else: Z = clf.predict_proba(np.c_[xx.ravel(), yy.ravel()])[:, 1] # Put the result into a color plot Z = Z.reshape(xx.shape) ax.contourf(xx, yy, Z, cmap=cm, alpha=.8) # Plot the training points ax.scatter(X_train[:, 0], X_train[:, 1], c=y_train, cmap=cm_bright, edgecolors='k') # Plot the testing points ax.scatter(X_test[:, 0], X_test[:, 1], c=y_test, cmap=cm_bright, edgecolors='k', alpha=0.6) ax.set_xlim(xx.min(), xx.max()) ax.set_ylim(yy.min(), yy.max()) ax.set_xticks(()) ax.set_yticks(()) if ds_cnt == 0: ax.set_title(name) ax.text(xx.max() - .3, yy.min() + .3, ('%.2f' % score).lstrip('0'), size=15, horizontalalignment='right') i += 1 plt.tight_layout() plt.show() |
The result in Python GUI:
Notes about the result:
The intuition conveyed by these examples does not necessarily carry over to real datasets. Particularly in high-dimensional spaces, data can more easily be separated linearly and the simplicity of classifiers such as Naive Bayes
and Linear SVM
s might lead to better generalization than is achieved by other classifiers.
Demo 2: Quantize Colors using K-Means Clustering
In this second demo, we perform a pixel-wise Vector Quantization (VQ) of an image of the summer palace (China), to reduce the number of colors required to show the image from 96,615
unique colors to 64
, while preserving the overall appearance quality.
In this example, pixels are represented in a 3D
-space and K-means
clustering is used to find 64
color clusters. In the image processing literature, the codebook
obtained from K-means (the cluster centers) is called the color palette. Using a single byte, up to 256
colors can be addressed, whereas an RGB
encoding requires 3
bytes per pixel. The GIF
file format, for example, uses such a palette.
For comparison, we also show a quantized image using a random codebook (colors picked up randomly).
The following code for color quantization using K-Means
is credited to Robert Layton, Olivier Grisel, and Mathieu Blondel (visit the original source in Reference [2]).
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 |
import numpy as np import matplotlib.pyplot as plt from sklearn.cluster import KMeans from sklearn.metrics import pairwise_distances_argmin from sklearn.datasets import load_sample_image from sklearn.utils import shuffle from time import time n_colors = 64 # Load the Summer Palace photo china = load_sample_image("china.jpg") # Convert to floats instead of the default 8 bits integer coding. Dividing by # 255 is important so that plt.imshow behaves works well on float data (need to # be in the range [0-1]) china = np.array(china, dtype=np.float64) / 255 # Load Image and transform to a 2D numpy array. w, h, d = original_shape = tuple(china.shape) assert d == 3 image_array = np.reshape(china, (w * h, d)) print("Fitting model on a small sub-sample of the data") t0 = time() image_array_sample = shuffle(image_array, random_state=0)[:1000] kmeans = KMeans(n_clusters=n_colors, random_state=0).fit(image_array_sample) print("done in %0.3fs." % (time() - t0)) # Get labels for all points print("Predicting color indices on the full image (k-means)") t0 = time() labels = kmeans.predict(image_array) print("done in %0.3fs." % (time() - t0)) codebook_random = shuffle(image_array, random_state=0)[:n_colors] print("Predicting color indices on the full image (random)") t0 = time() labels_random = pairwise_distances_argmin(codebook_random, image_array, axis=0) print("done in %0.3fs." % (time() - t0)) def recreate_image(codebook, labels, w, h): """Recreate the (compressed) image from the code book & labels""" d = codebook.shape[1] image = np.zeros((w, h, d)) label_idx = 0 for i in range(w): for j in range(h): image[i][j] = codebook[labels[label_idx]] label_idx += 1 return image # Display all results, alongside original image plt.figure(1) plt.clf() plt.axis('off') plt.title('Original image (96,615 colors)') plt.imshow(china) plt.figure(2) plt.clf() plt.axis('off') plt.title('Quantized image (64 colors, K-Means)') plt.imshow(recreate_image(kmeans.cluster_centers_, labels, w, h)) plt.figure(3) plt.clf() plt.axis('off') plt.title('Quantized image (64 colors, Random)') plt.imshow(recreate_image(codebook_random, labels_random, w, h)) plt.show() |
The result in Python GUI by Python4Delphi:
Congratulations, now you have learned how to run the Scikit-Learn library and use Python for Delphi to display it in the Delphi Windows GUI app. Now you can try a comprehensive guide to Machine Learning from these documentations using the Scikit-Learn library and Python4Delphi.
Check out the Scikit-Learn library for Python and use it in your projects: https://pypi.org/project/scikit-learn/ and
Check out Python4Delphi which easily allows you to build Python GUIs for Windows using Delphi: https://github.com/pyscripter/python4delphi
References & further readings
[1] Buitinck, L., Louppe, G., Blondel, M., Pedregosa, F., Mueller, A., Grisel, O., … & Varoquaux, G. (2013). API design for machine learning software: experiences from the scikit-learn project. arXiv preprint arXiv:1309.0238.
[2] Layton, R., Grisel, O., & Blondel, M. (2007-2023). Color Quantization using K-Means. Scikit-learn developers. scikit-learn.org/stable/auto_examples/cluster/plot_color_quantization.html
[3] Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., … & Duchesnay, E. (2011). Scikit-learn: Machine learning in Python. The Journal of machine Learning research, 12, 2825-2830.
[4] Varoquaux, G., Müller, A., & Grobler, J. (2007-2023). Classifier comparison. Scikit-learn developers. scikit-learn.org/stable/auto_examples/classification/plot_classifier_comparison.html