Skip to content

Multivariate – Missing Value Pattern Analysis (MCAR / MAR / MNAR) #13

@hansen-maria

Description

@hansen-maria

Is your feature request related to a problem? Please describe.
The littles_mcar_test() function in multivariate_analysis.py currently returns a placeholder {"Hello": "World"}. Missing value mechanisms (MCAR, MAR, MNAR) are critical for choosing the correct imputation strategy, but BioProfileKit provides no actionable guidance on this.

Describe the solution you'd like
Implement the full missing value mechanism pipeline:

  1. Little's MCAR test: Implement littles_mcar_test() using scipy chi-squared test on the missing-data pattern matrix. Return chi2, df, p_value, and a mechanism field ("MCAR" if p > 0.05).
  2. MAR detection: For each column with missing values, test correlation with missingness indicators of other columns using point-biserial correlation (scipy.stats.pointbiserialr). Flag pairs with |r| > 0.3.
  3. MNAR heuristic: Apply a KS-test comparing the distribution of an observed column conditioned on whether another column is missing vs present. High KS statistic suggests MNAR.

Populate mcar_result in MultivariateAnalysis with the full results and display in the existing MCAR tab in general_statistics.jinja.

Describe alternatives you've considered
Using the missingno or pyampute libraries. Both add dependencies; the scipy-based implementation covers the core use cases without additional requirements.

Additional context
The MAR and MNAR heuristics are already partially documented in the existing missing_mechanisms field structure used in the template. This issue closes the gap between the frontend rendering and the missing backend implementation.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request
    No fields configured for Feature.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions