Are you looking for tools to build website scrapers to automate your data collecting process, and build a nice GUI for them? You can build scalable web scrapers easily by combining BeautifulSoup and Python4Delphi library, inside Delphi and C++Builder.
BeautifulSoup is a library that makes it easy to scrape information from web pages. It sits atop an HTML or XML parser, providing Pythonic idioms for iterating, searching, and modifying the parse tree.
Since 2004, BeautifulSoup has been saving programmers hours or days of work on quick-turnaround screen scraping projects.
BeautifulSoup is a Python library designed for quick turnaround projects like screen-scraping. Three features make it powerful:
- BeautifulSoup provides a few simple methods and Pythonic idioms for navigating, searching, and modifying a parse tree: a toolkit for dissecting a document and extracting what you need. It doesn’t take much code to write an application
- BeautifulSoup automatically converts incoming documents to Unicode and outgoing documents to UTF-8. You don’t have to think about encodings unless the document doesn’t specify an encoding and Beautiful Soup can’t detect one. Then you just have to specify the original encoding.
- Beautiful Soup sits on top of popular Python parsers like lxml and html5lib, allowing you to try out different parsing strategies or trade speed for flexibility.
BeautifulSoup parses anything you give it and does the tree traversal stuff for you. You can tell it “Find all the links”, or “Find all the links of class externalLink”, or “Find all the links whose URLs match “foo.com”, or “Find the table heading that’s got bold text, then give me that text.”
Valuable data that was once locked up in poorly-designed websites is now within your reach. Projects that would have taken hours to take only minutes with Beautiful Soup.
Table of Contents
Hands-On
This post will guide you on how to run the BeautifulSoup library for scraping the data from the National Weather Service and display it in the Delphi Windows GUI app.
First, open and run our Python GUI using project Demo1 from Python4Delphi with RAD Studio. Then insert the script into the lower Memo, click the Execute button, and get the result in the upper Memo. You can find the Demo1 source on GitHub. The behind the scene details of how Delphi manages to run your Python code in this amazing Python GUI can be found at this link.
These are the steps for scraping the Austin/San Antonio, TX weather data from the National Weather Service in Python GUI by Python4Delphi:
- Import libraries:
1 2 3 |
from bs4 import BeautifulSoup import requests import pandas as pd |
- Read URL:
1 2 |
# Read url page = requests.get("https://forecast.weather.gov/MapClick.php?lat=30.2676&lon=-97.743") |
- Download the page and start parsing:
1 2 3 4 5 |
# Download the page and start parsing soup = BeautifulSoup(page.content, 'html.parser') seven_day = soup.find(id="seven-day-forecast") forecast_items = seven_day.find_all(class_="tombstone-container") tonight = forecast_items[0] |
- Extracting information from the page:
1 2 3 4 |
# Extract the name of the forecast item, the short description, and the temperature period = tonight.find(class_="period-name").get_text() short_desc = tonight.find(class_="short-desc").get_text() temp = tonight.find(class_="temp").get_text() |
- Extract the title attribute from the img tag:
1 2 3 |
# Extract the title attribute from the img tag img = tonight.find("img") desc = img['title'] |
- Extracting all the information from the page:
1 2 3 4 5 6 7 |
# Extracting all the information from the page period_tags = seven_day.select(".tombstone-container .period-name") periods = [pt.get_text() for pt in period_tags] short_descs = [sd.get_text() for sd in seven_day.select(".tombstone-container .short-desc")] temps = [t.get_text() for t in seven_day.select(".tombstone-container .temp")] descs = [d["title"] for d in seven_day.select(".tombstone-container img")] |
- Combining our data into a Pandas dataframe:
1 2 3 4 5 6 7 8 9 10 |
# Combining our data into a Pandas Dataframe weather = pd.DataFrame({ "period": periods, "short_desc": short_descs, "temp": temps, "desc":descs }) # Print the dataframe print(weather) |
- Run all the complete steps inside Python GUI:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 |
from bs4 import BeautifulSoup import requests import pandas as pd # Read url page = requests.get("https://forecast.weather.gov/MapClick.php?lat=30.2676&lon=-97.743") # Download the page and start parsing soup = BeautifulSoup(page.content, 'html.parser') seven_day = soup.find(id="seven-day-forecast") forecast_items = seven_day.find_all(class_="tombstone-container") tonight = forecast_items[0] # Extract the name of the forecast item, the short description, and the temperature period = tonight.find(class_="period-name").get_text() short_desc = tonight.find(class_="short-desc").get_text() temp = tonight.find(class_="temp").get_text() # Extract the title attribute from the img tag img = tonight.find("img") desc = img['title'] # Extracting all the information from the page period_tags = seven_day.select(".tombstone-container .period-name") periods = [pt.get_text() for pt in period_tags] short_descs = [sd.get_text() for sd in seven_day.select(".tombstone-container .short-desc")] temps = [t.get_text() for t in seven_day.select(".tombstone-container .temp")] descs = [d["title"] for d in seven_day.select(".tombstone-container img")] # Combining our data into a Pandas Dataframe weather = pd.DataFrame({ "period": periods, "short_desc": short_descs, "temp": temps, "desc":descs }) # Print the dataframe print(weather) |
Congratulations, now you have learned how to run the BeautifulSoup library for scraping the data from the National Weather Service and display it in the Delphi Windows GUI app! Now you can scrape any data you are interested in using the BeautifulSoup library and Python4Delphi.
Check out the BeautifulSoup library for Python and use it in your projects: https://pypi.org/project/beautifulsoup4/ and
Check out Python4Delphi which easily allows you to build Python GUIs for Windows using Delphi: https://github.com/pyscripter/python4delphi