Skip to content

fix: treat pandas Boolean as numeric in clinical_kernel#590

Merged
sebp merged 1 commit into
sebp:mainfrom
55Kamiryo:fix/clinical-kernel-bool-drop
May 25, 2026
Merged

fix: treat pandas Boolean as numeric in clinical_kernel#590
sebp merged 1 commit into
sebp:mainfrom
55Kamiryo:fix/clinical-kernel-bool-drop

Conversation

@55Kamiryo
Copy link
Copy Markdown
Contributor

@55Kamiryo 55Kamiryo commented May 21, 2026

Checklist

What does this implement/fix? Explain your changes

Fixes the two divergent Boolean-handling bugs in sksurv/kernels/clinical.py described in #589:

  1. clinical_kernel silent drop. _get_continuous_and_ordinal_array used x.select_dtypes(include=[np.number]), which excludes np.bool_ because numpy treats it as a sibling of np.number rather than a subclass. Boolean columns matched neither the numeric nor the object/category filter, were silently dropped, and the resulting matrix was biased by (n_features - n_bool_cols) / n_features because normalization still used x.shape[1].

    Fix: select_dtypes(include=[np.number, "bool"]).

  2. ClinicalKernelTransform.fit TypeError. _prepare_by_column_dtype uses pandas.api.types.is_numeric_dtype, which returns True for Boolean, but col.max() - col.min() then failed with TypeError: numpy boolean subtract on numpy ≥ 1.25.

    Fix: cast Boolean columns to np.uint8 before computing the range.

Both fixes align the pandas path with the policy already established in this codebase: pandas.api.types.is_numeric_dtype(bool) is True. After this change, clinical_kernel(df) and clinical_kernel(df.astype({col: 'uint8'})) produce identical kernel matrices, and ClinicalKernelTransform().fit(df) no longer crashes.

Tests

Added TestClinicalKernel.test_bool_column_treated_as_numeric asserting:

  • clinical_kernel returns the same matrix for a Boolean column and its uint8 equivalent.
  • ClinicalKernelTransform().fit produces matching _numeric_ranges and X_fit_ for both dtypes.
  • The Boolean column is classified into _numeric_columns, not _nominal_columns.

Local verification:

  • pytest tests/ (including slow tests): 966 passed, 48 skipped, 0 failed.
  • pytest --doctest-modules --pyargs sksurv.kernels: 1 passed.
  • ruff check on the whole repo: clean.
  • pre-commit run on changed files: all hooks pass.

Behavior change note

Users whose pipelines previously called clinical_kernel on a pandas frame containing Boolean columns will see different (corrected) kernel values after this change. The previous values were silently biased, so the change is a bug fix rather than an API change, but it may be worth flagging in the changelog.

@55Kamiryo 55Kamiryo requested a review from sebp as a code owner May 21, 2026 09:22
@codacy-production
Copy link
Copy Markdown

Up to standards ✅

🟢 Issues 0 issues

Results:
0 new issues

View in Codacy

🟢 Metrics 0 complexity · 0 duplication

Metric Results
Complexity 0
Duplication 0

View in Codacy

NEW Get contextual insights on your PRs based on Codacy's metrics, along with PR and Jira context, without leaving GitHub. Enable AI reviewer
TIP This summary will be updated as you push new changes.

@codecov
Copy link
Copy Markdown

codecov Bot commented May 25, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 98.46%. Comparing base (6a1df8a) to head (fd54167).

Additional details and impacted files
@@           Coverage Diff           @@
##             main     #590   +/-   ##
=======================================
  Coverage   98.46%   98.46%           
=======================================
  Files          38       38           
  Lines        3713     3715    +2     
  Branches      480      481    +1     
=======================================
+ Hits         3656     3658    +2     
  Misses         27       27           
  Partials       30       30           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Copy Markdown
Owner

@sebp sebp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for this!

@sebp sebp merged commit 0361ab9 into sebp:main May 25, 2026
20 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Bug: clinical_kernel silently drops pandas Boolean columns; ClinicalKernelTransform.fit raises TypeError on the same input

2 participants