Skip to content

Police-Data-Accessibility-Project/data-source-manager

Repository files navigation

Data Source Manager

A FastAPI application for identifying and cataloguing police data sources. Part of the Police Data Accessibility Project (PDAP).

The Source Manager collects URLs from various sources, enriches them with metadata using automated tasks and ML models, supports human annotation for validation, and synchronizes approved data sources to the Data Sources App.

Quick Start

# Install dependencies
uv sync

# Start the local database
cd local_database && docker compose up -d && cd ..

# Create a .env file (see ENV.md for all variables)
# At minimum, set the POSTGRES_* variables to match local_database defaults.

# Run the app
fastapi dev main.py

Then open http://localhost:8000/api for the interactive API docs.

Note: accessing API endpoints requires a valid Bearer token from the Data Sources API.

Documentation

Document Description
Architecture System design, module structure, task system, data flow
API Reference All 65 endpoints across 15 route groups
Development Guide Local setup, environment variables, common workflows
Testing Guide Running tests, CI pipeline, writing new tests
Deployment Docker, Alembic migrations, DS App synchronization
Collectors Collector architecture and how to build new ones
Environment Variables Full reference for all env vars and feature flags

Project Structure

src/
├── api/            # FastAPI routers and endpoint logic
├── core/           # Integration layer and task system
├── db/             # SQLAlchemy models, async DB client, queries
├── collectors/     # Pluggable URL collection strategies
├── external/       # Clients for external services (HuggingFace, PDAP, etc.)
├── security/       # JWT auth and permissions
└── util/           # Shared helpers

Contributing

Thank you for your interest in contributing to this project! Please follow these guidelines:

  • These Design Principles may be used to make decisions or guide your work.
  • If you want to work on something, create an issue first so the broader community can discuss it.
  • If you make a utility, script, app, or other useful bit of code: put it in a top-level directory with an appropriate name and dedicated README and add it to the index.

Code Quality

Docstrings and type hints are checked via a GitHub Action (python_checks.yml) using pydocstyle and mypy. These produce advisory PR comments and do not block merges.

Note: python_checks.yml only runs on pull requests from within the repo, not from forks.

About

Scripts for labeling relevant URLs as Data Sources.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 9

Languages