CodeLearn PythonPythonWindows

How To Add Python Profiling Tools Into Machine Learning Code

blog banner python profiler for machine learning code

Reducing code runtime is important for developers. Python Profilers, like cProfile, help us to find which part of the program or code takes more time to run. Whether you are using a Python GUI or the command line profiling is a huge help in tracking down code bottlenecks which impact performance.

This article will walk you through the process of using the cProfile module for extracting profiling data and the snakeviz module for visualization and implementing those steps to test machine learning scripts.

What is code profiling?

Code profiling is a technique to figure out how time is spent in a program. For more details, a profile is a set of statistics that describes how often and for how long various parts of the program are executed.

With these statistics, we can find the “hot spot” of a program and think about ways of improvement. Sometimes, a hot spot in an unexpected location may give you hints about bugs in your program.

A program running slow can generally be due to two reasons: A part is running slow, or a part is running too many times, adding up and taking too much time. We call these “performance hogs” the hot spot.

How do I get the cProfile library?

As cProfile is a built-in Python library, no further installation is needed.

How do I get the snakeviz library, to visualize the profiling results?

Here is how you can get a stable release of snakeviz using pip:

How do I implement Python profiling tools into my machine learning code?

Using a profiler inside code

The advantage of this method is we can focus on profiling only a part, instead of the entire program. For example, if we load a large module, it takes time to bootstrap, and we want to remove this from the profiler. In this case, we can invoke the profiler only for certain lines.

The following is an example of profiling an ordinary least square (OLS) linear regression program, only for the regression until the plotting steps:

Here is the output on PyScripter IDE:

output1_cprofile_ml-8073570

For the second example, let’s consider a program that uses a hill-climbing algorithm to find hyperparameters for a perceptron model. We want to profile the hill climb algorithm only for the hill-climbing search part:

Here is the output on PyScripter IDE:

output2_cprofile_ml-4194875

How can I use a Python code profiler at runtime from a command prompt?

Another way to perform the profiling of the machine learning script is by running cProfile at a runtime. The advantage of this method is you can easily profile the whole code just by one line of command, and you can export the profiling result as a file, for further analysis.

Here is how to do machine learning code profiling at the command prompt: First, remove the profiler parts, and save the code as ols.py. Next, we can run the profiler in the command line as follows:

The following is the excerpt of the profiling results:

output3_cprofile_ml-9403605

Do the same treatment to the Hillclimb algorithm script: Remove the profiler parts, and save the code as hillclimb.py. Next, we can run the profiler in the command line as follows:

The following is the excerpt of the profiling results:

output4_cprofile_ml-3687250

It provides very rich and detailed code profiling data.

 

How to sort Python profiling results by call count?

The profiling output as presented in previous sections is very long and may not be useful to us as it could be difficult to tell which function is the hot spot. So we can sort the above output by their call counts (ncalls) to find out the part that is running too many times, using the following command:

The following is an excerpt of the profiling results for ols.py, sorted from the most called function:

output11_cprofileols_orderedbycallcount-7800363

Run the following command to sort the profiling results of the Hillclimb algorithm by call count:

The following is an excerpt of the profiling results for hillclimb.py, sorted from the most called function:

output12_cprofilehillclimb_orderedbycallcount-3656514

 

How to sort Python profiling results by total time spent?

We can also sort the cProfile output by the total time spent in the given function (tottime) to find out the part that is running slow, using the following command:

And here is the output on the command prompt:

output13_cprofileols_orderedbytottime-1498133

Run the following command to sort the profiling results of the Hillclimb algorithm by total time spent in the given function:

And here is the output on the command prompt:

output14_cprofileols_orderedbyhillclimb-5495736

 

How to save Machine Learning code profiling results for further analysis?

Instead of only printing the profiling result on the command line, we can make it more useful to further results by exporting it into a file.

Here is how you can do it:

And the following command to save the profiling results of the Hillclimb algorithm:

The above command would export the profiling results into statsOls.dump and statsHillclimb.dump file.

 

How to visualize Python profiling results using snakeviz?

To visualize your Python code profiling results, call the .dump file with snakeviz, using this command:

It would start a snakeviz web server and would open the visualization results on your default browser. snakeviz web server started on 127.0.0.1:8080 by default.

You can set up the Style, Depth, and Cutoff of the visualization.

Visualize with Icicle style:

output5_snakevizols_icicle-4336785

Visualize with Sunburst style:

output6_snakevizols_sunburst-7178020

Excerpt of all the profiling results in tabular format:

output7_snakevizols_tabular-9263512

Do the same for the Hillclimb algorithm script using this command:

Visualization with Icicle style:

output8_hillclimb_icicle-1371488

Visualization with Sunburst style:

output9_hillclimb_sunburst-9398092

Excerpt of all the profiling results in tabular format:

output10_hillclimb_tabular-5733430

The following table is the explanation for each column:

ncallsThe number of calls.
tottimeThe total time spent in the given function (and excluding time made in calls to sub-functions).
percallThe quotient of tottime is divided by ncalls.
cumtimeThe cumulative time spent in this and all subfunctions (from invocation till exit). This figure is accurate even for recursive functions.
percallThe quotient of cumtime is divided by primitive calls.
filename:lineno(function)Provides the respective data of each function.

Amazing isn’t it? Now you can easily find out the bottleneck in your machine learning program using cProfile, and visualize them professionally with snakeviz. And start from now, you can add code profiling as an optional but powerful step in your machine learning workflow.

Finally, Python’s profiler gives you only the statistics on time but not memory usage. You may need to look for another library or tools for this purpose.

 

Click here to start using PyScripter, a free, feature-rich, and lightweight IDE for Python developers.

Download RAD Studio to build more powerful Python GUI Windows Apps 5x Faster with Less Code.

Check out Python4Delphi which easily allows you to build Python GUIs for Windows using Delphi.

Also, check out DelphiVCL which easily allows you to build GUIs for Windows using Python.

close
Related posts
Python

Top 7 Web Scraping Tools - Python For Data Scientists

Python

How To Plot A Line Graph In Python

Learn PythonPythonPython GUI

10 Python Data Visualization Libraries To Win Over Your Insights

Learn PythonPythonPython GUI

Top 6 Best Python Testing Frameworks [Updated 2022 List]

Добавить комментарий

Ваш адрес email не будет опубликован.

ru_RURussian