Are you looking for Python development tools that can be used in bioinformatics and to create a graphical user interface (GUI)?
You can build scalable Bioinformatics systems easily by combining these 6 powerful Python libraries and Python4Delphi for the GUI building part. Python4Delphi (P4D) is a free tool that allows you to execute Python scripts, create new Python modules and types in Delphi.
Table of Contents
What is Bioinformatics?
According to the National Human Genome Research Institute, Bioinformatics is a subdiscipline of biology and computer science concerned with the acquisition, storage, analysis, and dissemination of biological data, most often DNA and amino acid sequences. Bioinformatics uses computer programs for a variety of applications, including determining gene and protein functions, establishing evolutionary relationships, and predicting the three-dimensional shapes of proteins.
Why use Python for Bioinformatics?
According to Bitesize Bio, Python is particularly well suited to researchers because several biology programmers have already contributed many libraries to make Python science-friendly. Python documentation also has a section dedicated to its scientific audience. Here are some more reasons why Python could be your best choice of programming language for biology research:
- Widely used in the scientific community.
- Well-built libraries for complex scientific problems.
- Compatible with other existing tools.
- Easy manipulation of sequences like DNA, RNA, amino acids.
- Easy data manipulation and visualization.
Read these articles, to see “How Python is Powerful for Dealing with Scientific Purposes”:
Delphi adds Powerful GUI Features and Functionalities to Python
In this tutorial, we’ll build Windows Apps with extensive Bioinformatics capabilities by integrating Python’s Bioinformatics libraries with Embarcadero’s Delphi, using Python4Delphi (P4D).
P4D empowers Python users with Delphi’s award-winning VCL functionalities for Windows which enables us to build native Windows apps 5x faster. This integration enables us to create a modern GUI with Windows 10 looks and responsive controls for our Python for Bioinformatics applications. Python4Delphi also comes with an extensive range of demos, use cases, and tutorials.
We’re going to cover the following…
How to use Biopython, DEAP, Nilearn, PsychoPy, scikit-bio, and scikit-image Python libraries for Bioinformatics
All of them would be integrated with Python4Delphi to create Windows Apps with Bioinformatics capabilities.
Prerequisites
Before we begin to work, download and install the latest Python for your platform. Follow the Python4Delphi installation instructions mentioned here. Alternatively, you can check out the easy instructions found in the Getting Started With Python4Delphi video by Jim McKeeth.
A practical demo app
First, open and run our Python GUI using project Demo1 from Python4Delphi with RAD Studio. Then insert the script into the lower Memo, click the Execute button, and get the result in the upper Memo. You can find the Demo1 source on GitHub. The behind the scene details of how Delphi manages to run your Python code in this amazing Python GUI can be found at this link.
1. How do you perform Bioinformatics tasks with Biopython?
The Biopython Project is an international association of developers of freely available Python tools for computational molecular biology.
What can I find in the Biopython package?
The main Biopython releases have lots of functionality, including:
- The ability to parse bioinformatics files into Python utilizable data structures, including support for the following formats:
- Blast output – both from standalone and WWW Blast
- Clustalw
- FASTA
- GenBank
- PubMed and Medline
- ExPASy files, like Enzyme and Prosite
- SCOP, including ‘dom’ and ‘lin’ files
- UniGene
- SwissProt
- Files in the supported formats can be iterated over record by record or indexed and accessed via a Dictionary interface.
- Code to deal with popular online bioinformatics destinations such as:
- NCBI – Blast, Entrez, and PubMed services
- ExPASy – Swiss-Prot and Prosite entries, as well as Prosite searches
- Interfaces to common bioinformatics programs such as:
- Standalone Blast from NCBI
- Clustalw alignment program
- EMBOSS command-line tools
- A standard sequence class that deals with sequences, ids on sequences, and sequence features.
- Tools for performing common operations on sequences, such as translation, transcription, and weight calculations.
- Code to perform classification of data using k Nearest Neighbors, Naive Bayes, or Support Vector Machines.
- Code for dealing with alignments, including a standard way to create and deal with substitution matrices.
- Code making it easy to split up parallelizable tasks into separate processes.
- GUI-based programs to do basic sequence manipulations, translations, BLASTing, etc.
- Extensive documentation and help with using the modules, including this file, online wiki documentation, the website, and the mailing list.
- Integration with BioSQL, a sequence database schema also supported by the BioPerl and BioJava projects.
After installing Python4Delphi properly, you can get Biopython using pip or easy install to your command prompt:
1 |
pip install biopython |
Don’t forget to put the path where your Biopython library installed, to the System Environment Variables:
System Environment Variable Examples
1 2 3 |
C:/Users/YOUR_USERNAME/AppData/Local/Programs/Python/Python38/Lib/site-packages C:/Users/YOUR_USERNAME/AppData/Local/Programs/Python/Python38/Scripts C:/Users/YOUR_USERNAME/AppData/Local/Programs/Python/Python38 |
The following is a code example of the Biopython package to work with sequences and parsing FASTA formatted text file (run this inside the lower Memo of Python4Delphi Demo01 GUI):
1 2 3 4 5 6 7 8 9 10 11 12 13 |
from Bio.Seq import Seq from Bio import SeqIO my_seq = Seq("AGTACACTGGT") print(my_seq) print(my_seq.complement()) print(my_seq.reverse_complement()) for seq_record in SeqIO.parse("C:/Users/ASUS/Bio/examples/ls_orchid.fasta", "fasta"): print(seq_record.id) print(repr(seq_record.seq)) print(len(seq_record)) |
Here is the final Biopython result in Python GUI
2. How do you perform Bioinformatics tasks with DEAP?
DEAP is a novel evolutionary computation framework for the rapid prototyping and testing of ideas. It seeks to make algorithms explicit and data structures transparent. It works in perfect harmony with parallelization mechanisms such as multiprocessing and SCOOP.
DEAP includes the following features:
- Genetic algorithm using any imaginable representation
- List, Array, Set, Dictionary, Tree, Numpy Array, etc.
- Genetic programing using prefix trees
- Loosely typed, Strongly typed
- Automatically defined functions
- Evolution strategies (including CMA-ES)
- Multi-objective optimization (NSGA-II, NSGA-III, SPEA2, MO-CMA-ES)
- Coevolution (cooperative and competitive) of multiple populations
- Parallelization of the evaluations (and more)
- Hall of Fame of the best individuals that lived in the population
- Checkpoints that take snapshots of a system regularly
- Benchmarks module containing most common test functions
- Genealogy of an evolution (that is compatible with NetworkX)
- Examples of alternative algorithms: Particle Swarm Optimization, Differential Evolution, Estimation of Distribution Algorithm
How do I get the DEAP Python library?
First, here is how you can get DEAP:
1 |
pip install deap |
The following code is the implementation of DEAP for One Max Problem. The code credited to these authors: Félix-Antoine Fortin, EunSeop Shin, and François-Michel De Rainville:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 |
import random import numpy from deap import algorithms from deap import base from deap import creator from deap import tools creator.create("FitnessMax", base.Fitness, weights=(1.0,)) creator.create("Individual", numpy.ndarray, fitness=creator.FitnessMax) toolbox = base.Toolbox() toolbox.register("attr_bool", random.randint, 0, 1) toolbox.register("individual", tools.initRepeat, creator.Individual, toolbox.attr_bool, n=100) toolbox.register("population", tools.initRepeat, list, toolbox.individual) def evalOneMax(individual): return sum(individual), def cxTwoPointCopy(ind1, ind2): """Execute a two points crossover with copy on the input individuals. The copy is required because the slicing in numpy returns a view of the data, which leads to a self overwritting in the swap operation. It prevents :: >>> import numpy >>> a = numpy.array((1,2,3,4)) >>> b = numpy.array((5,6,7,8)) >>> a[1:3], b[1:3] = b[1:3], a[1:3] >>> print(a) [1 6 7 4] >>> print(b) [5 6 7 8] """ size = len(ind1) cxpoint1 = random.randint(1, size) cxpoint2 = random.randint(1, size - 1) if cxpoint2 >= cxpoint1: cxpoint2 += 1 else: # Swap the two cx points cxpoint1, cxpoint2 = cxpoint2, cxpoint1 ind1[cxpoint1:cxpoint2], ind2[cxpoint1:cxpoint2] = ind2[cxpoint1:cxpoint2].copy(), ind1[cxpoint1:cxpoint2].copy() return ind1, ind2 toolbox.register("evaluate", evalOneMax) toolbox.register("mate", cxTwoPointCopy) toolbox.register("mutate", tools.mutFlipBit, indpb=0.05) toolbox.register("select", tools.selTournament, tournsize=3) def main(): random.seed(64) pop = toolbox.population(n=300) # Numpy equality function (operators.eq) between two arrays returns the # equality element wise, which raises an exception in the if similar() # check of the hall of fame. Using a different equality function like # numpy.array_equal or numpy.allclose solve this issue. hof = tools.HallOfFame(1, similar=numpy.array_equal) stats = tools.Statistics(lambda ind: ind.fitness.values) stats.register("avg", numpy.mean) stats.register("std", numpy.std) stats.register("min", numpy.min) stats.register("max", numpy.max) algorithms.eaSimple(pop, toolbox, cxpb=0.5, mutpb=0.2, ngen=40, stats=stats, halloffame=hof) return pop, stats, hof if __name__ == "__main__": main() |
Here are the DEAP examples in the Python GUI:
3. How do you perform Bioinformatics tasks with Nilearn?
Nilearn enables approachable and versatile analyses of brain volumes. It provides statistical and Machine Learning tools, with instructive documentation & a friendly community.
It supports general linear model (GLM) based analysis and leverages the scikit-learn Python toolbox for multivariate statistics with applications such as predictive modeling, classification, decoding, or connectivity analysis.
First, here is how you can get Nilearn:
1 |
pip install nilearn |
Below is the code for fetching dataset using Nilearn (Run the following code inside the lower Memo of Python4Delphi Demo01 GUI):
1 2 3 4 5 6 7 8 9 |
from nilearn import datasets haxby_dataset = datasets.fetch_haxby() # The different files print(sorted(list(haxby_dataset.keys()))) # Path to first functional file print(haxby_dataset.func[0]) # Print the data description print(haxby_dataset.description) |
here are the Nilearn Python4Delphi Results
4. How do you perform Bioinformatics tasks with PsychoPy?
PsychoPy is an open-source package for creating experiments in behavioral science. It aims to provide a single package that is:
- precise enough for psychophysics
- easy enough for teaching
- flexible enough for everything else
- able to run experiments in a local Python script or online in JavaScript
To meet these goals PsychoPy provides a choice of interface – you can use a simple graphical user interface called Builder or write your experiments in Python code. The entire application and library are written in Python and are platform-independent.
How to get the PsychoPy library?
First, here is how you can get PsychoPy:
1 |
pip install PsychoPy |
Run these simple examples of PsychoPy code inside the lower Memo of Python4Delphi Demo01 GUI to generate your first stimulus:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 |
# Import some libraries from PsychoPy from psychopy import visual, core, event # Create a window mywin = visual.Window([800,600],monitor="testMonitor", units="deg") # Create some stimuli grating = visual.GratingStim(win=mywin, mask='circle', size=3, pos=[-4,0], sf=3) fixation = visual.GratingStim(win=mywin, size=0.2, pos=[0,0], sf=0, rgb=-1) #draw the stimuli and update the window while True: #this creates a never-ending loop grating.setPhase(0.05, '+')#advance phase by 0.05 of a cycle grating.draw() fixation.draw() mywin.flip() if len(event.getKeys())>0: break event.clearEvents() # Cleanup mywin.close() core.quit() |
PsychoPy Simple Examples:
5. How do you perform Bioinformatics tasks with scikit-bio?
scikit-bio is an open-source, BSD-licensed python package providing data structures, algorithms and educational resources for bioinformatics.
Here is how you can install scikit-bio :
1 |
pip install scikit-bio |
Run the following code to create a TabularMSA object with three DNA sequences and four positions:
1 2 3 4 5 6 7 8 9 10 |
from skbio import DNA, TabularMSA seqs = [ DNA('ACGT'), DNA('AG-T'), DNA('-C-T') ] msa = TabularMSA(seqs) print(msa) |
Here is the scikit-bio Demo Result in the Python GUI:
6. How do you perform Bioinformatics tasks with scikit-image?
scikit-image is an image processing library that implements algorithms and utilities for use in research, education, and industry applications. It is released under the liberal Modified BSD open source license, provides a well-documented API in the Python programming language, and is developed by an active, international team of collaborators.
scikit-image aims to:
- To provide high-quality, well-documented, and easy-to-use implementations of common image processing algorithms.
- To facilitate education in image processing.
- To address industry challenges.
First, here is how you can get scikit-image
1 |
pip install scikit-image |
Here is an example to interact with 3D images of kidney tissue:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 |
import matplotlib.pyplot as plt import numpy as np from scipy import ndimage as ndi import plotly import plotly.express as px from skimage import data # Load image data = data.kidney() print(f'number of dimensions: {data.ndim}') # Dimensions are provided in the following order: (z, y, x, c), i.e., [plane, row, column, channel]: print(f'shape: {data.shape}') print(f'dtype: {data.dtype}') # Dimensions are provided in the following order: (z, y, x, c), i.e., [plane, row, column, channel]: n_plane, n_row, n_col, n_chan = data.shape # Display both grayscale and RGB(A) 2D images _, ax = plt.subplots() ax.imshow(data[n_plane // 2]) plt.show() # According to the warning message, the range of values is unexpected. The image rendering is clearly not satisfactory colour-wise. vmin, vmax = data.min(), data.max() print(f'range: ({vmin}, {vmax})') # Turn to plotly’s implementation of the imshow function, for it supports value ranges beyond (0.0, 1.0) for floats and (0, 255) for integers. fig = px.imshow(data[n_plane // 2], zmax=vmax) #plotly.io.show(fig) fig.show() |
scikit-image with Python4Delphi Results
The second output will show up in your default browser (just like the default Plotly output):
7. Are you ready to build awesome things with these Python’s Bioinformatics libraries?
We already demonstrate 6 powerful Python libraries for Bioinformatics (Biopython, DEAP, Nilearn, PsychoPy, scikit-bio, and scikit-image). All of them wrapped well inside a powerful GUI provided by Python4Delphi. We can’t wait to see what you build with Python4Delphi!
Want to know some more? Then check out Python4Delphi which easily allows you to build Python GUIs for Windows using Delphi, and
Download RAD Studio to build more powerful Python GUI Windows Apps 5x Faster with Less Code.