Scrapes Active Places and counts the number of sites (indoor/outdoor/all) and subtypes (cricket, golf, hockey, etc.) for one local authority.
How to use the program: Instructions Document
Walkthrough including code: Walkthrough Document
- pandas: For data manipulation and analysis.
- os: To interact with the file system and identify the latest CSV file.
-
Identify the Latest Active Places CSV File
- The script scans the current directory for CSV files ending with
active_places.csv. - It selects the most recently modified file as the input dataset.
- If no such file is found, the script raises an error.
- The script scans the current directory for CSV files ending with
-
Extract the Base Name from the File
- The script extracts the portion of the filename before
active_placesto use as a prefix for output files.
- The script extracts the portion of the filename before
-
Load and Filter Data
- The CSV file is loaded into a pandas DataFrame.
- Rows where
Operational Statusis "No Grass Pitches Currently Marked Out" are removed. - Only the following columns are retained:
- Site Name
- Facility Type
- Facility Subtype
- Unit
- Number
- Management Type (Text)
-
Categorize Facility Type
- A new column
Type/Subtypeis created:- If
Facility Typeis Grass Pitches or Golf, it retainsFacility Subtype. - Otherwise, it retains
Facility Type.
- If
- A new column
-
Classify Facilities as Indoor or Outdoor
- A predefined list of indoor and outdoor facility types is used.
- The script adds a new column
Indoor/Outdoorbased on these lists.
-
Standardize Units
- The script maps certain unit names to standardized terms in a new column
New Units.
- The script maps certain unit names to standardized terms in a new column
-
Adjust Facility Numbers
- Some facility types (e.g., fitness studios, ski slopes, ice rinks) are assigned a default
New Numberof 1. - Otherwise,
New Numberretains its original value.
- Some facility types (e.g., fitness studios, ski slopes, ice rinks) are assigned a default
-
Standardize Facility Names
- The script applies predefined mappings to create a
New Type/Subtypecolumn with consistent naming conventions.
- The script applies predefined mappings to create a
-
Summarizing Facilities by Category
- The script filters data separately for indoor and outdoor facilities.
- It creates summary tables listing each facility type along with:
- Number of unique sites
- Total count of facilities
- Standardized unit type
-
Unique Site Counts
- The script calculates:
- The total number of unique sites
- The count of outdoor and indoor sites separately
- The count of educational sites for each category
- The script calculates:
-
Generate an Excel File
- The processed data is written to an Excel file named
<base_name> Bid Scrape.xlsx. - The file contains the following sheets:
Unique site counts: Summary of unique site statisticsOutdoor: Summary of outdoor facility dataIndoor: Summary of indoor facility data
- Each DataFrame is written without index columns for clarity
- The processed data is written to an Excel file named