Feature/package modernization#1
Merged
Merged
Conversation
…docs Phase 1 (Foundation): - Add __version__ to __init__.py - Create Typer CLI (features, predict, run) — all options, no positional args - Enhance pyproject.toml (entry point, metadata, dev/docs extras, tool configs) - Add Rich logging via setup_logging() in utils.py - Fix BAM resource leak (context manager on MSIProfileGenerator) - Replace print() with logging in predictor.py - Move site list to src/stride/data/ for package bundling - Add .gitignore, CHANGELOG.md Phase 2 (Rich Integration): - Add Rich progress bar to feature_generator.py:run() - Add Rich summary tables to predict/batch CLI output Phase 3 (Testing & CI): - Create test suite: conftest.py, test_utils.py, test_predictor.py, test_cli.py (33 tests) - Add test.yml (matrix: ubuntu/macOS × Python 3.11/3.12 + lint + docker-test) - Add release.yml (PyPI + GHCR Docker push on tag) - Add deploy-docs.yml (MkDocs versioned deploy via mike) Phase 4 (Documentation & Distribution): - Create .antigravity/ KI (ARCHITECTURE.md, DEVELOPMENT.md) - Create Dockerfile with OCI org.* labels - Create MkDocs scaffolding (teal/cyan theme, 6 documentation pages) - Create mkdocs.yml with Material theme, mermaid2, glightbox, mike
Set license to AGPL-3.0 in LICENSE and pyproject.toml. Added CONTRIBUTING.md and used mkdocs snippets to include CHANGELOG and CONTRIBUTING in the documentation without duplication.
Deleted the old scripts/ directory and environment.yml which are replaced by the Typer CLI and standard Python packaging. Rewrote README.md to serve as a lightweight pointer to the new MkDocs documentation site. Ported the clinical disclaimer to the docs index page.
Fixed deprecated typing imports (List/Dict) replacing them with standard collection types. Removed unused imports in the test suite. Fixed B904 exception chaining in cli.py. Ran black over the entire codebase to ensure style compliance.
Added pandas-stubs to dev dependencies. Added missing float and dict type annotations to feature_generator.py. Explicitly ignored no-any-return checking for scikit-learn dynamic model attributes in predictor.py.
Added a local regex util inside the CLI tests to strip Rich's ANSI escape codes from terminal outputs so that string flag matching works reliably across local and CI environments.
Dashboard: - Triple waterfall plots (L1, L2, Wasserstein) - Triple volcano plots (L1, L2, Wasserstein) - Distance correlation, distributions, entropy, quality violins - Reordered: DistCorr+DistHist → Volcanoes → Entropy+Violins - Hero: removed Mean L1 / Mean T-MapQ; shows Prediction/Score/Sites Site Explorer: - L1/L2/Wasserstein metric pills for re-ranking - Persistent legend via always-visible dummy traces - Dynamic reference line per locus Data Table (Tabulator.js v6): - Progress bar formatters for L1, L2, Wasserstein - Frozen Rank column - Dynamic viewport height with horizontal/vertical scrollbars - CSV download button - Conditional row styling (orange border on high-L1 loci) - MapQ < 40 warning styling - Cell tooltips, column resize, header text wrapping Tests: updated fixture with multi-distance columns, removed stale PDF test Docs: comprehensive rewrite of docs/user-guide/qc.md
…data table - Add three-tab QC report: Dashboard (8 Plotly cards), Site Explorer, Data Table - Dashboard: waterfalls, volcano plots, distance correlation, distributions, entropy scatter, quality violins, insert-size histogram - Site Explorer: searchable combobox, prev/next navigation, quality badges, per-locus stats strip with all distance metrics - Site Explorer: raw/normalized frequency toggle (pill buttons, persists across locus changes); fix Plotly binary blob serialisation for JS interop - Site Explorer: dynamic x-axis auto-scales to max observed repeat length - Site Explorer: Normal/Tumor identity labels in tooltips, wider bars - Data Table: Tabulator.js with frozen columns, progress-bar formatters, filters, sort, CSV export, conditional row styling - Light/dark theme toggle persists across all tabs - CLI: stride qc command; --generate-qc flag for stride run - Update docs/user-guide/qc.md to reflect new Site Explorer features - Update CHANGELOG.md with QC module entries
…ting - Add 'from __future__ import annotations' (PEP 563) to defer annotation evaluation, preventing NameError when plotly is not installed - Add TYPE_CHECKING guard for plotly.graph_objects import - Replace Optional[dict] with dict | None (ruff UP045) - Run black on cli.py, pipeline.py, qc.py, test_qc.py
- Add plotly.* to mypy ignore_missing_imports overrides in pyproject.toml - Add type: ignore[operator] for pandas Series string concatenation - Add type: ignore[call-overload] for iterrows() index cast - Add .astype(str) to repeat_unit column for consistent typing
Introduce a Nextflow DSL2 pipeline for STRiDE to enable scalable, nf-core-style execution. Adds nextflow/main.nf, nextflow.config, workflow (workflows/stride.nf), and modules (STRIDE_RUN, STRIDE_FEATURES, STRIDE_PREDICT, STRIDE_QC) plus a test profile and bundled test sample sheet for CI. Update Dockerfile to install the `qc` extras, add OCI labels/authors and AGPL license, and verify installation. Rename Python package to `stride-msk` in pyproject and update README/docs references; add detailed Nextflow user guide and mkdocs navigation entry.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
This PR introduces a comprehensive modernization of the STRiDE codebase, transforming it from a collection of standalone scripts into a fully packaged, testable, and CI-integrated Python project. This modernization aligns STRiDE with the established MSK hybrid bioinformatics standards (e.g.,
py-gbcms), significantly improving usability, maintainability, and distribution.🌟 Core Enhancements
stride features,stride predict, andstride run.setuptoolsconfigurations.[project.scripts]entry point.[dev](pytest, ruff, mypy) and[docs](mkdocs).RichHandlerlogging, replacing scatteredprint()andlogging.basicConfig()calls.🛠️ Stability & Code Quality
pysam.AlignmentFilehandles are closed, even upon error.data/structure. The 170-loci baseline is now shipped securely insidesrc/stride/data/and loaded robustly viaimportlib.resources.files().pytestsuite testing all core utilities, predictors, and CLI interfaces (33 tests, 100% passing).⚙️ CI/CD Pipelines
Added GitHub Actions conforming exactly to the MSK py-gbcms structure:
test.yml: Cross-platform testing matrix (Ubuntu/macOS × Python 3.11/3.12), linting (ruff,black,mypy), and Docker cache building verification.release.yml: Automated PyPI (Trusted Publisher OIDC) and GHCR Docker publishing upon semantic version tag push.deploy-docs.yml: Automated MkDocs page deployment (leveragingmikefor versioned stable/dev docs).📚 Documentation & Legal
docs/) using Material for MkDocs.README.mdnow acts as a clean pointer to the generated MkDocs site.LICENSEand reflected in PyPI classifiers.CONTRIBUTING.mdstandards. Configured MkDocs snippet routing soCONTRIBUTINGandCHANGELOGare organically rendered without text duplication..antigravity/ARCHITECTURE.mdandDEVELOPMENT.mdto persist systemic knowledge for future AI-assisted development cycles.Checklist
ruff checkandblack --checkpytest tests/)pip install -e .)stride --help)mkdocs build --strict)CHANGELOG.mdreflects modernizations