Caltrain On-Time Performance Tracker

Overview

A modernized application for tracking and analyzing Caltrain performance metrics. This project collects real-time train location data from the 511.org GTFS-RT API, processes it to determine arrival times and delays, and provides a web interface for visualizing the data.

Key Features

Real-time tracking of Caltrain locations and arrivals
Historical analysis of on-time performance
Visualization of delay patterns by time of day, day of week, and stop location
REST API for programmatic access to Caltrain performance data
Automated data collection and processing using Prefect workflows

Architecture

The application is built using the following technologies:

FastAPI: Modern, high-performance web framework for building APIs
Prefect: Workflow orchestration for managing data collection and processing tasks
PostgreSQL: Robust database for storing train location and performance data
SQLAlchemy: ORM for database interactions
Alembic: Database migration tool
Plotly: Interactive data visualizations
Docker: Containerization for easy deployment

Getting Started

Option 1: Using Docker Compose (Recommended)

Fork or clone this repository
Get a 511.org API key at https://511.org/open-data/token
Create a .env file in the root directory with your API key:
```
API_KEY="your-api-key"
```

Build and run the Docker containers:

docker compose build
docker compose up -d

Access the application:
- Web UI: http://localhost:8181
- API documentation: http://localhost:8181/docs
- Prefect dashboard: http://localhost:4200

Option 2: Local Development Setup

Clone the repository
Run the setup script to prepare your development environment:
```
./setup_dev.sh
```
Edit the .env file with your configuration
Create the PostgreSQL database (or use Docker for the database only)
Run the application:
```
python main.py
```

Project Structure

├── src/                     # Application source code
│   ├── api/                 # FastAPI models and endpoints
│   ├── data/                # Data processing utilities
│   ├── db/                  # Database connection and session handling
│   ├── models/              # SQLAlchemy data models
│   ├── pipelines/           # Prefect workflows for data collection and processing
│   ├── utils/               # Utility functions (time, geo, etc.)
│   └── config.py            # Application configuration
├── alembic/                 # Database migrations
├── static/                  # Static content (plots, data files)
│   ├── plots/               # Generated visualizations
│   └── data/                # Generated data files
├── gtfs_data/               # GTFS static feed data
├── docker-compose.yaml      # Docker Compose configuration
├── Dockerfile               # Docker image definition
├── main.py                  # Application entry point
└── requirements.txt         # Python dependencies

Methodology

Data collection

All data was gathered from the 511.org transit API.

The list of stops and stop times were downloaded from the GTFS API here: http://api.511.org/transit/datafeeds?api_key={API_KEY}&operator_id={OPERATOR}

Historical train position data was collected in every minute (per API restrictions) from the GTFS-RT Vehicle Monitoring API: https://api.511.org/transit/VehicleMonitoring?api_key={API_KEY}&agency={OPERATOR}

The GTFS-RT feed was parsed by vehicle to get train number, stop number, latitude, longitude, and the timestamp of when the data was collected, which was inserted into an SQLite database. Train arrival detection

Since the raw data only contains the location of each train and the stop it's travelling towards, we need to determine when the trains arrive. The distance to each stop was calculated using the Haversine formula on the train lat/long and the arriving stop lat/long. Since the data is relatively sparse, to determine when a train had arrived, the row with the minimum distance to the stop for each train ID, date, and stop ID was used to indicate train arrival.

Calculation of on-time performance

On-time performance was calculated on a per-stop, per-train basis. For each stop on a route, the train status is marked as delayed if the train arrives to the stop more than 4 minutes behind schedule. Minor delays are defined as delays between 5-14 minutes, and major delays are 15+ minutes.

Name		Name	Last commit message	Last commit date
Latest commit History 110 Commits
GTFSTransitData_CT		GTFSTransitData_CT
alembic		alembic
archive		archive
docs		docs
gtfs_data		gtfs_data
notebooks		notebooks
scripts		scripts
src		src
static		static
.dockerignore		.dockerignore
.gitattributes		.gitattributes
.gitignore		.gitignore
.python-version		.python-version
Caltrain dashboard ideas.txt		Caltrain dashboard ideas.txt
Dockerfile.app		Dockerfile.app
Dockerfile.prefect		Dockerfile.prefect
LICENSE		LICENSE
README.md		README.md
alembic.ini		alembic.ini
check_db.py		check_db.py
data_processing_log.txt		data_processing_log.txt
docker-compose.yaml		docker-compose.yaml
fetch_and_process_gtfsrt.py		fetch_and_process_gtfsrt.py
main.py		main.py
pyproject.toml		pyproject.toml
rebuild_plots.py		rebuild_plots.py
requirements-prefect.txt		requirements-prefect.txt
requirements.txt		requirements.txt
run_data_processing_standalone.py		run_data_processing_standalone.py
setup_dev.sh		setup_dev.sh
test_data_collection.py		test_data_collection.py
test_data_processing.py		test_data_processing.py
test_data_processing_direct.py		test_data_processing_direct.py
test_direct_db_access.py		test_direct_db_access.py
test_prefect.py		test_prefect.py
test_sqlite_db.py		test_sqlite_db.py
timezone-fix.py		timezone-fix.py
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Caltrain On-Time Performance Tracker

Overview

Key Features

Architecture

Getting Started

Option 1: Using Docker Compose (Recommended)

Option 2: Local Development Setup

Project Structure

Methodology

Data collection

Calculation of on-time performance

Commute time windows

Morning

Evening

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

Jsakkos/caltrain-tracker

Folders and files

Latest commit

History

Repository files navigation

Caltrain On-Time Performance Tracker

Overview

Key Features

Architecture

Getting Started

Option 1: Using Docker Compose (Recommended)

Option 2: Local Development Setup

Project Structure

Methodology

Data collection

Calculation of on-time performance

Commute time windows

Morning

Evening

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages