The python environment is managed using pipenv.
You'll need a working version of pip installed and on the path.
You should have this if you have python installed.
Install pipenv using pip install pipenv.
Next, navigate to this repository in the terminal.
Run pipenv sync to install all packages (specified in Pipfile.lock).
You can then use pipenv shell to activate the virutal env and install additional packages with pipenv install <package_name>.
If you're using VScode, you can load the jupyter notebook and choose the kernel associated with the virtual environment you created. This should allow you to run the notebooks with the required packages.
gtfrt2gtfs_interpolation - Current method for matching buses to the timetable. Makes use of current_stop_sequence and current_status in GTFS-RT, along with interpolation to fill gaps where we don't track buses.
gtfsrt_to_csv.ipynv - Code for converting and cleaning GTFSRT raw data into CSV format. Useful as a pre-processing step.
demo.ipynb - Older method for matching buses to the timetable using distance and bearing between stops and buses.
intersection.ipynb - Script to calculate geojson intersections for equal time isochrones over multiple days' data.
utils.py - General utility functions to keep main notebooks tidy.
gtfs-realtime-utils.py - GTFS Realtime specific utility functions to keep notebooks tidy.
A notebook to calculate populations inside isochrones for England.
Functions in utils.py. Main script is calculator.ipynb
Realtime bus data that powers our bus tracking tool.
process.ipynb is the main script for producing the data.
Data is organised by region (using NUTS codes).
Each data file is named by a unique ID, which is either scraped from bustimes.org, or failing that, the route_short_name-agency_noc.json e.g. X84-FLDS.json.
id: a unique id of the bus route.name: human readable equivalent of ID.agency_name: Name of the company operating the busagency_noc: national operator codebustimesorg: boolean. Whether or not the meta info was matched with bustimes.org.
Keys are shape_id from the GTFS timetable.
Values an array of arrays containing long/lat pairs. E.g. [[lon_1, lat_1],...,[lon_n, lat_n]]
There can be more than one shape for each route.
Dictionary of stops on the route.
Keys are the stop_id of each stop on the route.
Values are:
name: human name of the stoplon: longitudelat: latitudebearing: Direction of travel on an 8 point compass.
trips: the stop_id and arrival times of buses for stops on the route, according to the timetable, and the realtime data
trips is an array. Each item of the array is itself an array of stop times. Each stop time is an array of the form [stop_id, realtime, timetable, interpolated].
stop_id is a unique identifier of the stop, realtime is a unixtimestamp of when the bus arrived at the stop and timetable is a timestamp of when the bus was timetabled to arrive at the stop. interpolated is either 1 or 0, where 0 means the timestamp was observed in the live data and 1 means it is an interpolated value.
DEPRACATED: Exploring use of R5R to create travel time isochrones.
extract.py - Extract the gtfsrt.bin file from the gtfsrt.zip that comes from BODS for all the live location data we downloaded.
rename.py - rename a file endings to .zip.
Documentation on GTFS and GTFS-RT format can read https://gtfs.org/documentation/overview/. It takes a bit of time to get familiar with the GTFS format but the documentation is helpful and worth referring to.
You can verify GTFS timetables here https://gtfs-validator.mobilitydata.org/.