Summary
The configuration handling should be improved for usability. Defaults live in code, user config can be supplied from any path, required fields are validated, and legacy behavior remains as a fallback.
Motivation
preprocessing/constants.py hard-loads multipleye_settings_preprocessing.yaml from a fixed path, causing imports to fail if the file is missing.
- Scripts should accept a config path while modules import constants that already loaded a different file. This can be contradictive in some cases.
- Docs need to be updated. They still mention the outdated
config.py.
Goals
- Single source of truth for defaults in code. YAML overrides all defaults.
- Clear validation errors for required settings.
- Flexible load order: explicit path (CLI/API) > env var > CWD default file > legacy repo-root file.
- Backwards-compatible with current location.
- Programmatic API to set/update settings
Proposed Approach
- Add a lightweight
Settings object (in preprocessing/constants.py or rather renamed to preprocessing/config.py) with:
- In-code defaults (e.g.,
EXPECTED_SAMPLING_RATE_HZ, etc.).
load_from_yaml(path), update(dict), _validate() for required keys (e.g., data_collection_name).
- Initialize without side effects; optionally auto-load legacy path only if present (no crash on import).
- Loading precedence:
- Explicit path from CLI
--config / --config_path or API.
- Env var
MULTIPLEYE_CONFIG.
./multipleye_settings_preprocessing.yaml (CWD).
- Legacy repo-root
multipleye_settings_preprocessing.yaml (fallback, with deprecation warning)
- Expose a stable programmatic API:
from preprocessing import settings and settings.<field>
- Update scripts to rely solely on the central
settings instance (remove duplicate YAML reads)
Affected Areas
preprocessing/constants.py (defaults, Settings class).
preprocessing/__init__.py (expose settings).
preprocessing/scripts/run_multipleye_preprocessing.py (use central loader and precedence).
- Notebook
preprocessing.ipynb (stop assuming root YAML; show settings.load_from_yaml, would be the currently used relatively local ./...yml). When the package gets loaded and none is found but the the user uses settings.load_from_yaml, a deprecation warning would be too much? Then the user would not have time to load the config before the depracation is raised. Maybe the config can be lazy-loaded, only when it is first accessed, not at package load directly.
- Docs: getting started + configuration guide
- Adding tests
Acceptance Criteria
- Importing
preprocessing works without any YAML present.
- Providing a YAML via
--config_path or MULTIPLEYE_CONFIG fully overrides defaults
- Missing required fields raise a clear error at load time.
- Legacy root YAML is loaded only as a fallback and emits a deprecation warning.
- Single
settings instance is used across modules and scripts (no divergent configs). Or better class variables, so no instance mess happens?
- Updated docs show new precedence and examples.
Migration / Backward Compatibility
- Continue supporting root-level
multipleye_settings_preprocessing.yaml for one minor release with a deprecation warning.
- Encourage users to pass
--config_path or set MULTIPLEYE_CONFIG for scripts! For notebook, use settings.load_from_yaml
Documentation Tasks
- Getting Started: explain config precedence and
--config_path usage.
- Configuration Guide: remove references to old
config.py; add required/optional keys and examples for CLI and programmatic loading.
- Notebook: show
settings.load_from_yaml(Path("path/to/config.yaml")) and restart-note.
Risks / Notes
- Ensure no module-level side effects trigger early loads before CLI parsing; defer loads to script entry points where possible.
- Add unit tests for load order, validation, and required-field errors.
Summary
The configuration handling should be improved for usability. Defaults live in code, user config can be supplied from any path, required fields are validated, and legacy behavior remains as a fallback.
Motivation
preprocessing/constants.pyhard-loadsmultipleye_settings_preprocessing.yamlfrom a fixed path, causing imports to fail if the file is missing.config.py.Goals
Proposed Approach
Settingsobject (inpreprocessing/constants.pyor rather renamed topreprocessing/config.py) with:EXPECTED_SAMPLING_RATE_HZ, etc.).load_from_yaml(path),update(dict),_validate()for required keys (e.g.,data_collection_name).--config/--config_pathor API.MULTIPLEYE_CONFIG../multipleye_settings_preprocessing.yaml(CWD).multipleye_settings_preprocessing.yaml(fallback, with deprecation warning)from preprocessing import settingsandsettings.<field>settingsinstance (remove duplicate YAML reads)Affected Areas
preprocessing/constants.py(defaults,Settingsclass).preprocessing/__init__.py(exposesettings).preprocessing/scripts/run_multipleye_preprocessing.py(use central loader and precedence).preprocessing.ipynb(stop assuming root YAML; showsettings.load_from_yaml, would be the currently used relatively local./...yml). When the package gets loaded and none is found but the the user usessettings.load_from_yaml, a deprecation warning would be too much? Then the user would not have time to load the config before the depracation is raised. Maybe the config can be lazy-loaded, only when it is first accessed, not at package load directly.Acceptance Criteria
preprocessingworks without any YAML present.--config_pathorMULTIPLEYE_CONFIGfully overrides defaultssettingsinstance is used across modules and scripts (no divergent configs). Or better class variables, so no instance mess happens?Migration / Backward Compatibility
multipleye_settings_preprocessing.yamlfor one minor release with a deprecation warning.--config_pathor setMULTIPLEYE_CONFIGfor scripts! For notebook, usesettings.load_from_yamlDocumentation Tasks
--config_pathusage.config.py; add required/optional keys and examples for CLI and programmatic loading.settings.load_from_yaml(Path("path/to/config.yaml"))and restart-note.Risks / Notes