Skip to content

Feature/package modernization#1

Merged
rhshah merged 13 commits into
developfrom
feature/package-modernization
Feb 27, 2026
Merged

Feature/package modernization#1
rhshah merged 13 commits into
developfrom
feature/package-modernization

Conversation

@rhshah

@rhshah rhshah commented Feb 24, 2026

Copy link
Copy Markdown
Member

Description

This PR introduces a comprehensive modernization of the STRiDE codebase, transforming it from a collection of standalone scripts into a fully packaged, testable, and CI-integrated Python project. This modernization aligns STRiDE with the established MSK hybrid bioinformatics standards (e.g., py-gbcms), significantly improving usability, maintainability, and distribution.

🌟 Core Enhancements

  • Typer CLI (src/stride/cli.py): Replaced the old scripts/ directory with a cohesive, unified Typer CLI.
    • Commands: stride features, stride predict, and stride run.
    • Adopts the strict "--options only" pattern (no positional arguments) for robustness.
  • Standardized Packaging (pyproject.toml): Replaced environment.yml with native setuptools configurations.
    • Includes [project.scripts] entry point.
    • Defines optional dependencies for [dev] (pytest, ruff, mypy) and [docs] (mkdocs).
  • Rich Integration:
    • Centralized, colorful RichHandler logging, replacing scattered print() and logging.basicConfig() calls.
    • Live progress bars and elapsed time tracking for feature generation loops.
    • Formatted summary tables for prediction outputs and batch runs.

🛠️ Stability & Code Quality

  • Resource Leak Fix: Added context managers (enter / exit) to MSIProfileGenerator to guarantee that underlying pysam.AlignmentFile handles are closed, even upon error.
  • Data Bundling: Reorganized the data/ structure. The 170-loci baseline is now shipped securely inside src/stride/data/ and loaded robustly via importlib.resources.files().
  • Testing: Introduced a complete pytest suite testing all core utilities, predictors, and CLI interfaces (33 tests, 100% passing).

⚙️ CI/CD Pipelines

Added GitHub Actions conforming exactly to the MSK py-gbcms structure:

  • test.yml: Cross-platform testing matrix (Ubuntu/macOS × Python 3.11/3.12), linting (ruff, black, mypy), and Docker cache building verification.
  • release.yml: Automated PyPI (Trusted Publisher OIDC) and GHCR Docker publishing upon semantic version tag push.
  • deploy-docs.yml: Automated MkDocs page deployment (leveraging mike for versioned stable/dev docs).

📚 Documentation & Legal

  • MkDocs: Built a full documentation site (docs/) using Material for MkDocs.
  • Lightweight README: README.md now acts as a clean pointer to the generated MkDocs site.
  • AGPL-3.0 License: Added globally via LICENSE and reflected in PyPI classifiers.
  • Contributing: Established CONTRIBUTING.md standards. Configured MkDocs snippet routing so CONTRIBUTING and CHANGELOG are organically rendered without text duplication.
  • AI Knowledge Items: Authored .antigravity/ARCHITECTURE.md and DEVELOPMENT.md to persist systemic knowledge for future AI-assisted development cycles.

Checklist

  • Code passes ruff check and black --check
  • Test suite passes locally (pytest tests/)
  • Package installs cleanly (pip install -e .)
  • CLI commands execute successfully (stride --help)
  • Documentation builds perfectly (mkdocs build --strict)
  • CHANGELOG.md reflects modernizations

…docs

Phase 1 (Foundation):
- Add __version__ to __init__.py
- Create Typer CLI (features, predict, run) — all options, no positional args
- Enhance pyproject.toml (entry point, metadata, dev/docs extras, tool configs)
- Add Rich logging via setup_logging() in utils.py
- Fix BAM resource leak (context manager on MSIProfileGenerator)
- Replace print() with logging in predictor.py
- Move site list to src/stride/data/ for package bundling
- Add .gitignore, CHANGELOG.md

Phase 2 (Rich Integration):
- Add Rich progress bar to feature_generator.py:run()
- Add Rich summary tables to predict/batch CLI output

Phase 3 (Testing & CI):
- Create test suite: conftest.py, test_utils.py, test_predictor.py, test_cli.py (33 tests)
- Add test.yml (matrix: ubuntu/macOS × Python 3.11/3.12 + lint + docker-test)
- Add release.yml (PyPI + GHCR Docker push on tag)
- Add deploy-docs.yml (MkDocs versioned deploy via mike)

Phase 4 (Documentation & Distribution):
- Create .antigravity/ KI (ARCHITECTURE.md, DEVELOPMENT.md)
- Create Dockerfile with OCI org.* labels
- Create MkDocs scaffolding (teal/cyan theme, 6 documentation pages)
- Create mkdocs.yml with Material theme, mermaid2, glightbox, mike
Set license to AGPL-3.0 in LICENSE and pyproject.toml. Added CONTRIBUTING.md and used mkdocs snippets to include CHANGELOG and CONTRIBUTING in the documentation without duplication.
Deleted the old scripts/ directory and environment.yml which are replaced by the Typer CLI and standard Python packaging. Rewrote README.md to serve as a lightweight pointer to the new MkDocs documentation site. Ported the clinical disclaimer to the docs index page.
Fixed deprecated typing imports (List/Dict) replacing them with standard collection types. Removed unused imports in the test suite. Fixed B904 exception chaining in cli.py. Ran black over the entire codebase to ensure style compliance.
Added pandas-stubs to dev dependencies. Added missing float and dict type annotations to feature_generator.py. Explicitly ignored no-any-return checking for scikit-learn dynamic model attributes in predictor.py.
Added a local regex util inside the CLI tests to strip Rich's ANSI escape codes from terminal outputs so that string flag matching works reliably across local and CI environments.
@rhshah rhshah self-assigned this Feb 24, 2026
Dashboard:
- Triple waterfall plots (L1, L2, Wasserstein)
- Triple volcano plots (L1, L2, Wasserstein)
- Distance correlation, distributions, entropy, quality violins
- Reordered: DistCorr+DistHist → Volcanoes → Entropy+Violins
- Hero: removed Mean L1 / Mean T-MapQ; shows Prediction/Score/Sites

Site Explorer:
- L1/L2/Wasserstein metric pills for re-ranking
- Persistent legend via always-visible dummy traces
- Dynamic reference line per locus

Data Table (Tabulator.js v6):
- Progress bar formatters for L1, L2, Wasserstein
- Frozen Rank column
- Dynamic viewport height with horizontal/vertical scrollbars
- CSV download button
- Conditional row styling (orange border on high-L1 loci)
- MapQ < 40 warning styling
- Cell tooltips, column resize, header text wrapping

Tests: updated fixture with multi-distance columns, removed stale PDF test
Docs: comprehensive rewrite of docs/user-guide/qc.md
…data table

- Add three-tab QC report: Dashboard (8 Plotly cards), Site Explorer, Data Table
- Dashboard: waterfalls, volcano plots, distance correlation, distributions,
  entropy scatter, quality violins, insert-size histogram
- Site Explorer: searchable combobox, prev/next navigation, quality badges,
  per-locus stats strip with all distance metrics
- Site Explorer: raw/normalized frequency toggle (pill buttons, persists across
  locus changes); fix Plotly binary blob serialisation for JS interop
- Site Explorer: dynamic x-axis auto-scales to max observed repeat length
- Site Explorer: Normal/Tumor identity labels in tooltips, wider bars
- Data Table: Tabulator.js with frozen columns, progress-bar formatters,
  filters, sort, CSV export, conditional row styling
- Light/dark theme toggle persists across all tabs
- CLI: stride qc command; --generate-qc flag for stride run
- Update docs/user-guide/qc.md to reflect new Site Explorer features
- Update CHANGELOG.md with QC module entries
…ting

- Add 'from __future__ import annotations' (PEP 563) to defer annotation
  evaluation, preventing NameError when plotly is not installed
- Add TYPE_CHECKING guard for plotly.graph_objects import
- Replace Optional[dict] with dict | None (ruff UP045)
- Run black on cli.py, pipeline.py, qc.py, test_qc.py
- Add plotly.* to mypy ignore_missing_imports overrides in pyproject.toml
- Add type: ignore[operator] for pandas Series string concatenation
- Add type: ignore[call-overload] for iterrows() index cast
- Add .astype(str) to repeat_unit column for consistent typing
Introduce a Nextflow DSL2 pipeline for STRiDE to enable scalable, nf-core-style execution. Adds nextflow/main.nf, nextflow.config, workflow (workflows/stride.nf), and modules (STRIDE_RUN, STRIDE_FEATURES, STRIDE_PREDICT, STRIDE_QC) plus a test profile and bundled test sample sheet for CI. Update Dockerfile to install the `qc` extras, add OCI labels/authors and AGPL license, and verify installation. Rename Python package to `stride-msk` in pyproject and update README/docs references; add detailed Nextflow user guide and mkdocs navigation entry.
@rhshah rhshah merged commit 714e364 into develop Feb 27, 2026
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant