Skip to content

enh: Improve config file integration #59

@cbueth

Description

@cbueth

Summary

The configuration handling should be improved for usability. Defaults live in code, user config can be supplied from any path, required fields are validated, and legacy behavior remains as a fallback.

Motivation

  • preprocessing/constants.py hard-loads multipleye_settings_preprocessing.yaml from a fixed path, causing imports to fail if the file is missing.
  • Scripts should accept a config path while modules import constants that already loaded a different file. This can be contradictive in some cases.
  • Docs need to be updated. They still mention the outdated config.py.

Goals

  • Single source of truth for defaults in code. YAML overrides all defaults.
  • Clear validation errors for required settings.
  • Flexible load order: explicit path (CLI/API) > env var > CWD default file > legacy repo-root file.
  • Backwards-compatible with current location.
  • Programmatic API to set/update settings

Proposed Approach

  • Add a lightweight Settings object (in preprocessing/constants.py or rather renamed to preprocessing/config.py) with:
    • In-code defaults (e.g., EXPECTED_SAMPLING_RATE_HZ, etc.).
    • load_from_yaml(path), update(dict), _validate() for required keys (e.g., data_collection_name).
  • Initialize without side effects; optionally auto-load legacy path only if present (no crash on import).
  • Loading precedence:
    1. Explicit path from CLI --config / --config_path or API.
    2. Env var MULTIPLEYE_CONFIG.
    3. ./multipleye_settings_preprocessing.yaml (CWD).
    4. Legacy repo-root multipleye_settings_preprocessing.yaml (fallback, with deprecation warning)
  • Expose a stable programmatic API: from preprocessing import settings and settings.<field>
  • Update scripts to rely solely on the central settings instance (remove duplicate YAML reads)

Affected Areas

  • preprocessing/constants.py (defaults, Settings class).
  • preprocessing/__init__.py (expose settings).
  • preprocessing/scripts/run_multipleye_preprocessing.py (use central loader and precedence).
  • Notebook preprocessing.ipynb (stop assuming root YAML; show settings.load_from_yaml, would be the currently used relatively local ./...yml). When the package gets loaded and none is found but the the user uses settings.load_from_yaml, a deprecation warning would be too much? Then the user would not have time to load the config before the depracation is raised. Maybe the config can be lazy-loaded, only when it is first accessed, not at package load directly.
  • Docs: getting started + configuration guide
  • Adding tests

Acceptance Criteria

  • Importing preprocessing works without any YAML present.
  • Providing a YAML via --config_path or MULTIPLEYE_CONFIG fully overrides defaults
  • Missing required fields raise a clear error at load time.
  • Legacy root YAML is loaded only as a fallback and emits a deprecation warning.
  • Single settings instance is used across modules and scripts (no divergent configs). Or better class variables, so no instance mess happens?
  • Updated docs show new precedence and examples.

Migration / Backward Compatibility

  • Continue supporting root-level multipleye_settings_preprocessing.yaml for one minor release with a deprecation warning.
  • Encourage users to pass --config_path or set MULTIPLEYE_CONFIG for scripts! For notebook, use settings.load_from_yaml

Documentation Tasks

  • Getting Started: explain config precedence and --config_path usage.
  • Configuration Guide: remove references to old config.py; add required/optional keys and examples for CLI and programmatic loading.
  • Notebook: show settings.load_from_yaml(Path("path/to/config.yaml")) and restart-note.

Risks / Notes

  • Ensure no module-level side effects trigger early loads before CLI parsing; defer loads to script entry points where possible.
  • Add unit tests for load order, validation, and required-field errors.

Metadata

Metadata

Assignees

Labels

documentationImprovements or additions to documentationenhancementNew feature or request

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions