Skip to content

Add coverage reporting for C++ and Python tests (Phase 1)#196

Open
gagelarsen wants to merge 11 commits into
masterfrom
gagelarsen/coverage-reporting
Open

Add coverage reporting for C++ and Python tests (Phase 1)#196
gagelarsen wants to merge 11 commits into
masterfrom
gagelarsen/coverage-reporting

Conversation

@gagelarsen

Copy link
Copy Markdown
Member

Summary

  • Adds dev/coverage.sh and .github/workflows/Coverage.yaml for line-coverage on Linux/macOS via gcov+gcovr (C++) and pytest-cov (Python). Coverage instrumentation is gated by an XMSGRID_COVERAGE env var injected through build.toml's extra_cmake_text, so xmsconan_gen needs no changes.
  • CI gates the job with CPP_COVERAGE_THRESHOLD / PY_COVERAGE_THRESHOLD env vars (initially 0); reports upload as workflow artifacts on every run. No third-party reporting service is used — adopting one is a deferred decision documented in the spec.
  • Phase 1 measures the layers independently. Cross-credit (Python tests measuring C++ coverage) is intentionally Phase 2; the spec at docs/superpowers/specs/2026-05-08-coverage-reporting-design.md calls out the three xmsconan-side constraints that make unification costly to do repo-locally and lays out the lift plan.

Test plan

  • Coverage workflow run on this PR completes and produces non-zero baseline numbers in the gcovr / pytest output.
  • coverage-html and coverage-xml artifacts download and open correctly.
  • After baseline is known, bump CPP_COVERAGE_THRESHOLD / PY_COVERAGE_THRESHOLD in Coverage.yaml and confirm the job fails when the script is locally edited to drop a covered line below threshold.

🤖 Generated with Claude Code

gagelarsen and others added 11 commits May 8, 2026 09:25
Two-phase plan: Phase 1 lands repo-local coverage (CMake option,
dev/coverage.sh, Coverage.yaml, codecov.yml) using gcov + gcovr for C++
and coverage.py for Python, unified in Codecov. Phase 2 lifts the
generic pieces into xmsconan once Phase 1 stabilizes.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
C++ coverage comes from a Debug+testing Conan build instrumented via a new
env-var-gated CMake block (XMSGRID_COVERAGE) in build.toml's extra_cmake_text;
gcovr scans the resulting build folder in the Conan cache.

Python coverage runs separately: the Release+pybind build produces a wheel
which dev/coverage.sh installs into a clean venv and exercises with
pytest-cov.

Both reports upload to Codecov as separate flags via a hand-maintained
Coverage workflow that lives alongside the xmsconan-generated CI workflow.
The codecov.yml is informational-only — Codecov will not block PRs.

Cross-credit (Python tests measuring C++ coverage) is deferred to Phase 2,
which will lift the orchestration into xmsconan and address the conanfile
constraints (BUILD_TESTING/IS_PYTHON_BUILD mutual exclusion, pybind forced
to Release, no pytest-cov in the build venv) that make unification costly
to do repo-locally.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Drops codecov.yml and the codecov-action upload steps. Adds
--fail-under-line / --cov-fail-under flags driven by
CPP_COVERAGE_THRESHOLD and PY_COVERAGE_THRESHOLD env vars (default 0),
set explicitly in Coverage.yaml. CI fails when either layer falls below
its configured threshold.

HTML and Cobertura XML reports still upload as workflow artifacts on
every run (with if: always()) so reviewers can inspect numbers even on
threshold failures. The XMLs remain in standard format, so adopting a
reporting service later is a small additional step.

Spec updated: Phase 1 explicitly defers the reporting-service decision.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Both are rendered by xmsconan_gen from templates and should not be
tracked, matching the treatment of CMakeLists.txt, conanfile.py, etc.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
C++: gcovr was given the repo root as --root, but Conan compiles sources
out of ~/.conan2/p/<pkg>/b/src/, so the repo-relative xmsgrid/ filter
matched nothing ("All coverage data is filtered out"). Discover the
matching source folder per build folder (cmake_layout puts it at the
sibling 'src' directory two parents up from the build folder) and pass
it as --root.

Python: the wheel declares an xmscore>=7.0.8 runtime dep that lives on
Aquaveo devpi, not PyPI. Add --extra-index-url to the wheel pip install
so pip can resolve it. Honors COVERAGE_PIP_INDEX for override.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
cmake_layout in this project puts source at <pkg>/b/xmsgrid/, not
<pkg>/b/src/xmsgrid/ as I assumed. Replace the hardcoded ../../src
path with an actual scan: walk up from the build folder looking for
the first ancestor that contains an xmsgrid/ directory.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
C++: gcovr's path-prefix filter resolution against .gcno-recorded
absolute source paths was producing 'all data filtered out' even with
--root set correctly. Use an explicit anywhere-in-path regex
('.*/xmsgrid/.*') to bypass the prefix-matching ambiguity.

Python: pytest's auto-rootdir discovery was finding the in-tree
_package/xms/grid/ directory and putting it on sys.path, shadowing the
venv's installed wheel and producing 'cannot import _xmsgrid' errors.
Copy tests to build/coverage-tests/ and run pytest from there, mirroring
what the conanfile's own python test runner does.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Coverage summary table now reports threshold, actual, and pass/fail
status per layer. Switch from gcovr's --fail-under-line and pytest-cov's
--cov-fail-under (which would short-circuit before the summary block) to
parsing each tool's JSON summary, doing the threshold check ourselves,
and exiting non-zero at the end if any layer is below.

Pin CPP_COVERAGE_THRESHOLD=74.7 and PY_COVERAGE_THRESHOLD=83 in
Coverage.yaml — the baseline numbers from the first green run.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Previous baseline read 83% from the rounded log line; the precise actual
is 82.9. Adjusts the threshold so the very next run passes against the
unchanged code.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Raw coverage values from gcovr/coverage.py have more precision than the
1-decimal display (e.g. 82.875 → '82.9'). When a threshold was pinned
to the displayed value, the raw comparison failed even though the
displayed actual matched. Round the actual to 1 decimal before
comparing so threshold checks line up with what's reported.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Gives headroom over the current baseline (74.7% C++, 82.9% Python) so
small fluctuations don't fail the job. Tighten later once the team
agrees on a regression policy.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant