Are you looking for powerful tools to analyze and manipulate structured data, and build a nice GUI for them? You can build fast, expressive, insightful, and scalable data analysis tools easily by combining pandas and Python4Delphi library, inside Delphi and C++Builder.
pandas is a Python package that provides fast, flexible, and expressive data structures designed to work with structured (tabular, multidimensional, potentially heterogeneous) and time-series data easily and intuitively.
pandas aim to be the fundamental high-level building block for doing practical, real-world data analysis in Python. Additionally, it has the broader goal of becoming the most powerful and flexible open-source data analysis/manipulation tool available in any language. It is already on its way toward this goal.
Table of Contents
10+ Amazing pandas Examples inside the Delphi Windows GUI App
This post will guide you on how to run various data analysis and manipulation examples using the pandas library and using Python for Delphi to display it in the Delphi Windows GUI app.
First, open and run our Python GUI using project Demo01
from Python4Delphi with RAD Studio. Then insert the script into the lower Memo, click the Execute button, and get the result in the upper Memo. You can find the Demo01
source on GitHub. The behind the scene details of how Delphi manages to run your Python code in this amazing Python GUI can be found at this link.
These examples will cover almost various functions and methods you are most likely to use in a typical data analysis process. Let’s run them all in our Python4Delphi Demo01
GUI:
1. Reading the CSV file into a pandas dataframe
1 2 3 4 5 6 7 8 |
import numpy as np import pandas as pd # Read the dataset df = pd.read_csv("/Churn_Modelling.csv") # See the data print(df) |
2. Check the shape or dimension of the dataset
1 2 |
# Print the data shape print(df.shape) |
3. See the column labels of the DataFrame
1 2 |
# Print the data columns df.columns |
4. Dropping columns
We want to remove 4 columns: ‘RowNumber
‘, ‘CustomerId
‘, ‘Surname
‘, and ‘CreditScore
‘. The axis
parameter is set as 1
to drop columns and 0
for rows. The inplace
parameter is set as True
to save the changes:
1 2 3 4 5 6 |
# Drop 4 columns df.drop(['RowNumber', 'CustomerId', 'Surname', 'CreditScore'], axis=1, inplace=True) print(df.shape) |
We dropped 4 columns so the number of columns reduced to 10 from 14.
5. Select particular columns while reading
We want to read only specific columns: ‘Gender
‘, ‘Age
‘, ‘Tenure
‘, and ‘Balance
‘:
1 2 3 4 |
# Select particular columns df_spec = pd.read_csv("/Churn_Modelling.csv", usecols=['Gender', 'Age', 'Tenure', 'Balance']) print(df_spec.head()) |
6. Reading a part of the dataframe (from the first n number of the rows)
We want to read the first 5000
rows of the CSV file:
1 2 3 4 |
# Reading a part of the dataframe df_partial = pd.read_csv("/Churn_Modelling.csv", nrows=5000) print(df_partial.shape) |
7. Select rows from the end of the file
We can also select rows from the end of the file by using the skiprows
parameter. skiprows=5000
means that we will skip the first 5000
rows while reading the csv
file:
1 2 3 4 |
# Select rows from the end of the file df_partialEnd = pd.read_csv("/Churn_Modelling.csv", skiprows=5000) print(df_partialEnd.shape) |
8. Draw a small sample to work
We can either use the n
parameter or frac
parameter to determine the sample size.
n
: The number of rows in the sample
1 2 3 |
# The number of rows in the sample df_sample = df.sample(n=1000) print(df_sample.shape) |
frac
: The ratio of the sample size to the whole dataframe size
1 2 3 |
# The ratio of the sample size to the whole dataframe size df_sample2 = df.sample(frac=0.2) print(df_sample2.shape) |
9. Checking the missing values
Using the isna
with the sum
function, we can see the number of missing values in each column:
1 2 |
# Check the missing values print(df.isna().sum()) |
10. Adding missing values using loc and iloc
The “loc
” and “iloc
” will select rows and columns based on index or label.
loc
: selects with labeliloc
: selects with index
1 2 3 4 5 6 7 8 9 |
# Adding missing values using loc and iloc ## Create 20 random indices to select missing_index = np.random.randint(10000, size=20) ## We will use these indices to change some values as np.nan (missing value) df.loc[missing_index, ['Balance','Geography']] = np.nan ## Let’s try another example using the indices instead of labels (select the last column) df.iloc[missing_index, -1] = np.nan print(df.isna().sum()) |
11. Fill the missing values
- Fill
NA
using the most common value (mode)
1 2 3 4 5 6 |
# Filling missing values ## See the "Geography" column print(df["Geography"].value_counts()) ## Fill NA using the most common value (mode) mode = df['Geography'].value_counts().index[0] df['Geography'].fillna(value=mode, inplace=True) |
- Fill
NA
using the mean value
1 2 3 4 5 |
## Fill NA using the mean value avg = df['Balance'].mean() df['Balance'].fillna(value=avg, inplace=True) print(df.isna().sum()) |
Congratulations, now you have learned how to run various data analysis and manipulation examples using the pandas library and using Python for Delphi to display it in the Delphi Windows GUI app!
Check out the pandas
library for Python and use it in your projects: https://pypi.org/project/pandas/ and
Check out Python4Delphi
which easily allows you to build Python GUIs for Windows using Delphi: https://github.com/pyscripter/python4delphi
References & further readings
[1] Hakim, M. A. (2022). Build The Ultimate GUI For Pandas To Perform Complex Data Analysis. Embarcadero Blogs. blogs.embarcadero.com/ultimate-guide-for-building-gui-for-pandas-to-perform-complex-data-analysis
[2] Yıldırım, S. (2020). 30 Examples to Master Pandas. Towards Data Science. towardsdatascience.com/30-examples-to-master-pandas-f8a2da751fa4