How To Add Python Profiling Tools Into Machine Learning Code

blog banner python profiler for machine learning code

Reducing code runtime is important for developers. Python Profilers, like cProfile, help us to find which part of the program or code takes more time to run. Whether you are using a Python GUI or the command line profiling is a huge help in tracking down code bottlenecks which impact performance.

This article will walk you through the process of using the cProfile module for extracting profiling data and the snakeviz module for visualization and implementing those steps to test machine learning scripts.

Table of Contents

What is code profiling?

Code profiling is a technique to figure out how time is spent in a program. For more details, a profile is a set of statistics that describes how often and for how long various parts of the program are executed.

With these statistics, we can find the “hot spot” of a program and think about ways of improvement. Sometimes, a hot spot in an unexpected location may give you hints about bugs in your program.

A program running slow can generally be due to two reasons: A part is running slow, or a part is running too many times, adding up and taking too much time. We call these “performance hogs” the hot spot.

How do I get the cProfile library?

As cProfile is a built-in Python library, no further installation is needed.

How do I get the snakeviz library, to visualize the profiling results?

Here is how you can get a stable release of snakeviz using pip:

1	pip install snakeviz

How do I implement Python profiling tools into my machine learning code?

How can I use a profiler inside Python code?

The advantage of this method is we can focus on profiling only a part, instead of the entire program. For example, if we load a large module, it takes time to bootstrap, and we want to remove this from the profiler. In this case, we can invoke the profiler only for certain lines.

The following is an example of profiling an ordinary least square (OLS) linear regression program, only for the regression until the plotting steps:

# Import profiling tools

import cProfile as profile

import pstats

# Code source for Ordinary Linear Regression: Jaques Grobler

# License: BSD 3 clause

import matplotlib.pyplot as plt

import numpy as np

from sklearn import datasets, linear_model

from sklearn.metrics import mean_squared_error, r2_score

# Load the diabetes dataset

diabetes_X, diabetes_y = datasets.load_diabetes(return_X_y=True)

# Use only one feature

diabetes_X = diabetes_X[:, np.newaxis, 2]

# Split the data into training/testing sets

diabetes_X_train = diabetes_X[:-20]

diabetes_X_test = diabetes_X[-20:]

# Split the targets into training/testing sets

diabetes_y_train = diabetes_y[:-20]

diabetes_y_test = diabetes_y[-20:]

# Perform all the regression steps with profiling

prof = profile.Profile()

prof.enable()

# Create linear regression object

regr = linear_model.LinearRegression()

# Train the model using the training sets

regr.fit(diabetes_X_train, diabetes_y_train)

# Make predictions using the testing set

diabetes_y_pred = regr.predict(diabetes_X_test)

# The coefficients

print("Coefficients: n", regr.coef_)

# The mean squared error

print("Mean squared error: %.2f" % mean_squared_error(diabetes_y_test, diabetes_y_pred))

# The coefficient of determination: 1 is perfect prediction

print("Coefficient of determination: %.2f" % r2_score(diabetes_y_test, diabetes_y_pred))

# Plot outputs

plt.scatter(diabetes_X_test, diabetes_y_test, color="black")

plt.plot(diabetes_X_test, diabetes_y_pred, color="blue", linewidth=3)

plt.xticks(())

plt.yticks(())

prof.disable()

# Print profiling output

stats = pstats.Stats(prof).strip_dirs().sort_stats("cumtime")

stats.print_stats(10) # Print only top 10 rows

# Show plot

plt.show()

Here is the output on PyScripter IDE:

python profiling tools for machine learning code 01

For the second example, let’s consider a program that uses a hillclimbing algorithm to find hyperparameters for a Perceptron model. We want to profile the hill climb algorithm only for the hillclimbing search part:

# Import profiling tools

import cProfile as profile

import pstats

# Manually search perceptron hyperparameters for binary classification

from numpy import mean

from numpy.random import randn

from numpy.random import rand

from sklearn.datasets import make_classification

from sklearn.model_selection import cross_val_score

from sklearn.model_selection import RepeatedStratifiedKFold

from sklearn.linear_model import Perceptron

# Objective function

def objective(X, y, cfg):

# Unpack config

eta, alpha = cfg

# Define model

model = Perceptron(penalty='elasticnet', alpha=alpha, eta0=eta)

# Define evaluation procedure

cv = RepeatedStratifiedKFold(n_splits=10, n_repeats=3, random_state=1)

# Evaluate model

scores = cross_val_score(model, X, y, scoring='accuracy', cv=cv, n_jobs=-1)

# Calculate mean accuracy

result = mean(scores)

return result

# Take a step in the search space

def step(cfg, step_size):

# Unpack the configuration

eta, alpha = cfg

# Step eta

new_eta = eta + randn() * step_size

# Check the bounds of eta

if new_eta <= 0.0:

new_eta = 1e-8

if new_eta > 1.0:

new_eta = 1.0

# Step alpha

new_alpha = alpha + randn() * step_size

# Check the bounds of alpha

if new_alpha < 0.0:

new_alpha = 0.0

# Return the new configuration

return [new_eta, new_alpha]

# Hill climbing local search algorithm

def hillclimbing(X, y, objective, n_iter, step_size):

# Starting point for the search

solution = [rand(), rand()]

# Evaluate the initial point

solution_eval = objective(X, y, solution)

# Run the hill climb

for i in range(n_iter):

# Take a step

candidate = step(solution, step_size)

# Evaluate candidate point

candidate_eval = objective(X, y, candidate)

# Check if we should keep the new point

if candidate_eval >= solution_eval:

# Store the new point

solution, solution_eval = candidate, candidate_eval

# Report progress

print('>%d, cfg=%s %.5f' % (i, solution, solution_eval))

return [solution, solution_eval]

# Define dataset

X, y = make_classification(n_samples=1000, n_features=5, n_informative=2, n_redundant=1, random_state=1)

# Define the total iterations

n_iter = 100

# Step size in the search space

step_size = 0.1

# Perform the hill climbing search with profiling

prof = profile.Profile()

prof.enable()

cfg, score = hillclimbing(X, y, objective, n_iter, step_size)

prof.disable()

# Print program output

print('Done!')

print('cfg=%s: Mean Accuracy: %f' % (cfg, score))

# Print profiling output

stats = pstats.Stats(prof).strip_dirs().sort_stats("cumtime")

stats.print_stats(10) # Print only top 10 rows

Here is the output on PyScripter IDE:

python profiling tools for machine learning code 02

How can I use a Python code profiler at runtime from a command prompt?

Another way to perform the profiling of the machine learning script is by running cProfile at a runtime. The advantage of this method is you can easily profile the whole code just by one line of command, and you can export the profiling result as a file, for further analysis.

Here is how to do machine learning code profiling at the command prompt: First, remove the profiler parts, and save the code as ols.py. Next, we can run the profiler in the command line as follows:

1	python -m cProfile ols.py

The following is the excerpt of the profiling results:

python profiling tools for machine learning code 03

Do the same treatment to the Hillclimb algorithm script: Remove the profiler parts, and save the code as hillclimb.py. Next, we can run the profiler in the command line as follows:

1	python -m cProfile hillclimb.py

The following is the excerpt of the profiling results:

python profiling tools for machine learning code 04

It provides very rich and detailed code profiling data.

How to sort Python profiling results by call count?

The profiling output as presented in previous sections is very long and may not be useful to us as it could be difficult to tell which function is the hot spot. So we can sort the above output by their call counts (ncalls) to find out the part that is running too many times, using the following command:

1	python -m cProfile -s ncalls ols.py

The following is an excerpt of the profiling results for ols.py, sorted from the most called function:

python profiling tools for machine learning code 05

Run the following command to sort the profiling results of the Hillclimb algorithm by call count:

1	python -m cProfile -s ncalls hillclimb.py

The following is an excerpt of the profiling results for hillclimb.py, sorted from the most called function:

python profiling tools for machine learning code 06

How to sort Python profiling results by total time spent?

We can also sort the cProfile output by the total time spent in the given function (tottime) to find out the part that is running slow, using the following command:

1	python -m cProfile -s tottime ols.py

And here is the output on the command prompt:

python profiling tools for machine learning code 07

Run the following command to sort the profiling results of the Hillclimb algorithm by total time spent in the given function:

1	python -m cProfile -s tottime hillclimb.py

And here is the output on the command prompt:

python profiling tools for machine learning code 08

How to save Machine Learning code profiling results for further analysis?

Instead of only printing the profiling result on the command line, we can make it more useful to further results by exporting it into a file.

Here is how you can do it:

1	python -m cProfile -o statsOls.dump ols.py

And the following command to save the profiling results of the Hillclimb algorithm:

1	python -m cProfile -o statsHillclimb.dump hillclimb.py

The above command would export the profiling results into statsOls.dump and statsHillclimb.dump file.

How to visualize Python profiling results using snakeviz?

To visualize your Python code profiling results, call the .dump file with snakeviz, using this command:

1	snakeviz statsOls.dump

It would start a snakeviz web server and would open the visualization results on your default browser. snakeviz web server started on 127.0.0.1:8080 by default.

You can set up the Style, Depth, and Cutoff of the visualization.

Visualize the profiling results for an ordinary least square (OLS) linear regression program in `Icicle` style:

visualizing python profiling tools for machine learning code 01 icicle

Visualize the profiling results for an ordinary least square (OLS) linear regression program in `Sunburst` style:

visualizing python profiling tools for machine learning code 02 sunburst

Excerpt of all the profiling results for an ordinary least square (OLS) linear regression program in tabular format:

visualizing python profiling tools for machine learning code 03 tabular

Do the same for the Hillclimb algorithm script using this command:

1	snakeviz statsHillclimb.dump

Visualize the profiling results for the Hillclimb algorithm script in `Icicle` style:

visualizing python profiling tools for machine learning code 04 icicle 02

Visualize the profiling results for the Hillclimb algorithm script in `Sunburst` style:

visualizing python profiling tools for machine learning code 05 sunburst 02

Excerpt of all the profiling results for the Hillclimb algorithm script in tabular format:

visualizing python profiling tools for machine learning code 06 tabular 02

The following table is the explanation for each column:

`ncalls`	The number of calls.
`tottime`	The total time spent in the given function (and excluding time made in calls to sub-functions).
`percall`	The quotient of `tottime` is divided by `ncalls`.
`cumtime`	The cumulative time spent in this and all subfunctions (from invocation till exit). This figure is accurate even for recursive functions.
`percall`	The quotient of `cumtime` is divided by primitive calls.
`filename:lineno(function)`	Provides the respective data of each function.

Amazing isn’t it? Now you can easily find out the bottleneck in your machine learning program using cProfile, and visualize them professionally with snakeviz. And start from now, you can add code profiling as an optional but powerful step in your machine learning workflow.

Finally, Python’s profiler gives you only the statistics on time but not memory usage. You may need to look for another library or tools for this purpose.

Click here to start using PyScripter, a free, feature-rich, and lightweight IDE for Python developers.

Download RAD Studio to build more powerful Python GUI Windows Apps 5x Faster with Less Code.

Check out Python4Delphi which easily allows you to build Python GUIs for Windows using Delphi.

Also, check out DelphiVCL which easily allows you to build GUIs for Windows using Python.

References & further readings

[1] Nguyen, D. (2021).

How to profile code in Python. AnyMind Group. anymindgroup.com/news/tech-blog/15280

[2] Shrivarsheni. (2020).

cProfile – How to profile your python code. Machine Learning Plus. machinelearningplus.com/python/cprofile-how-to-profile-your-python-code

[3] Stack Overflow. (2011).

cProfile saving data to file causes jumbles of characters. stackoverflow.com/questions/8283112/cprofile-saving-data-to-file-causes-jumbles-of-characters

[4] Tam, A. (2022).

Profiling Python Code. Machine Learning Mastery. machinelearningmastery.com/profiling-python-code

How To Add Python Profiling Tools Into Machine Learning Code

What is code profiling?

How do I get the cProfile library?

How do I get the snakeviz library, to visualize the profiling results?

How do I implement Python profiling tools into my machine learning code?

How can I use a profiler inside Python code?

How can I use a Python code profiler at runtime from a command prompt?

How to sort Python profiling results by call count?

How to sort Python profiling results by total time spent?

How to save Machine Learning code profiling results for further analysis?

How to visualize Python profiling results using snakeviz?

Visualize the profiling results for an ordinary least square (OLS) linear regression program in Icicle style:

Visualize the profiling results for an ordinary least square (OLS) linear regression program in Sunburst style:

Excerpt of all the profiling results for an ordinary least square (OLS) linear regression program in tabular format:

Visualize the profiling results for the Hillclimb algorithm script in Icicle style:

Visualize the profiling results for the Hillclimb algorithm script in Sunburst style:

Excerpt of all the profiling results for the Hillclimb algorithm script in tabular format:

References & further readings

[1] Nguyen, D. (2021).

[2] Shrivarsheni. (2020).

[3] Stack Overflow. (2011).

[4] Tam, A. (2022).

Related posts

Leave a Reply Cancel reply

Something Fresh

What People Reading

Categories

Python GUI

Categories

Useful Links

Follow us

Visualize the profiling results for an ordinary least square (OLS) linear regression program in `Icicle` style:

Visualize the profiling results for an ordinary least square (OLS) linear regression program in `Sunburst` style:

Visualize the profiling results for the Hillclimb algorithm script in `Icicle` style:

Visualize the profiling results for the Hillclimb algorithm script in `Sunburst` style: