pyNullstrap

pyNullstrap is a Python implementation of the Nullstrap method — a flexible, model-agnostic framework for variable selection with false discovery rate (FDR) control [1]. It supports high-dimensional settings across a range of statistical models including

✅ Linear regression
✅ Generalized linear models
✅ Cox proportional hazards models
✅ Gaussian graphical models

📦 Installation

Installation from Source

# Clone the repository
git clone https://github.com/PromptBio/pyNullstrap.git
cd pyNullstrap

# Basic installation (core package only)
pip install -e .

# Development installation (includes testing and dev tools)
pip install -e .[test,dev]

🚀 Quick Start

Basic Usage

import numpy as np
from nullstrap.models.lm import NullstrapLM

# Generate sample data
np.random.seed(42)
n, p = 100, 50
X = np.random.randn(n, p)
y = X[:, :5] @ np.ones(5) + 0.1 * np.random.randn(n)

# Fit Nullstrap with FDR control
model = NullstrapLM(fdr=0.1, B_reps=5)
model.fit(X, y)

# Get selected variable indices
selected = model.selected_
print(f"Selected features: {selected}")

# Get additional information
threshold = model.threshold_
statistics = model.statistic_
correction_factor = model.correction_factor_

print(f"Selection threshold: {threshold:.4f}")
print(f"Correction factor: {correction_factor:.4f}")

Model Examples

from nullstrap.models.glm import NullstrapGLM
from nullstrap.models.cox import NullstrapCox
from nullstrap.models.ggm import NullstrapGGM

# Logistic regression
glm_model = NullstrapGLM(fdr=0.05, alpha_=0.01, B_reps=10)
# Create binary target for logistic regression
y_binary = np.random.binomial(1, 0.5, size=n)  # Random binary target
glm_model.fit(X, y_binary)

# Cox survival model (requires scikit-survival)
# Create survival data in scikit-survival format
events = np.array([True, True, False, True])  # boolean
times = np.array([5.2, 1.1, 8.0, 2.3])      # float
y_survival = np.array([(event, time) for event, time in zip(events, times)], 
                     dtype=[('event', bool), ('time', float)])

cox_model = NullstrapCox(fdr=0.1)
cox_model.fit(X, y_survival)

# Graphical model for network inference
graphical_model = NullstrapGGM(fdr=0.1)
graphical_model.fit(X)  # No y needed for graphical models
precision_matrix = graphical_model.precision_
selected_edges = graphical_model.selected_

Pipeline Integration

from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from nullstrap.models.lm import NullstrapLM

# Create a complete pipeline
pipeline = Pipeline([
    ('scaler', StandardScaler()),
    ('selector', NullstrapLM(fdr=0.1)),
    ('classifier', LogisticRegression())
])

# Generate sample train/test data
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Fit and transform
pipeline.fit(X_train, y_train)
X_selected = pipeline.transform(X_test)

📊 Model Support

Model Type	Class	Description
Linear	`NullstrapLM`	Gaussian linear regression
Logistic	`NullstrapGLM`	Logistic regression (binomial GLM)
Cox Survival	`NullstrapCox`	Cox proportional hazards model
Graphical Models	`NullstrapGGM`	Gaussian graphical model with L1 penalty

🔬 Algorithm Overview

Nullstrap is a general framework for variable selection with false discovery rate (FDR) control. It works by fitting your chosen estimator on both the observed data and synthetic responses generated under the global null model. The synthetic fit serves as a negative control: large coefficients appearing in both real and null fits are likely spurious, while coefficients exceeding what the null produces are declared discoveries. This contrast enables FDR control while preserving power across many model families.

Key Steps

Fit models on both real data and synthetic null responses
Estimate correction factor to account for finite-sample effects
Calculate FDP estimator using null response statistics
Select threshold to control FDR at target level
Output selection of variables exceeding the threshold

For detailed mathematical formulation and theoretical foundations, see the original paper [1] and the Methodology documentation (under construction).

Key Features

Model Agnostic: Works with any regularized statistical model
FDR Control: Provides theoretical guarantees on false discovery rate
High Dimensional: Designed for settings where p >> n
Flexible: Supports various model families and regularization schemes

📚 Documentation

For detailed usage examples, advanced configuration options, performance monitoring, model evaluation, and comprehensive API reference, see the full documentation (under construction).

For testing information, quick start commands, and Jupyter notebook examples, see tests/README.md.

📖 License

This project is licensed under the Apache License 2.0. See the LICENSE file for details.

📬 Feedback

Feel free to open issues, submit PRs, or contact the maintainer for feature requests or bug reports.

📄 Citation

If you use nullstrap in your research, please cite:

[1] Wang, C., Zhang, Z., & Li, J. J. (2025). Nullstrap: A Simple, High-Power, and Fast Framework for FDR Control in Variable Selection for Diverse High-Dimensional Models. arXiv preprint arXiv:2501.05012. https://arxiv.org/abs/2501.05012

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
nullstrap		nullstrap
tests		tests
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
pyproject.toml		pyproject.toml
pytest.ini		pytest.ini

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

pyNullstrap

📦 Installation

Installation from Source

🚀 Quick Start

Basic Usage

Model Examples

Pipeline Integration

📊 Model Support

🔬 Algorithm Overview

Key Steps

Key Features

📚 Documentation

📖 License

📬 Feedback

📄 Citation

About

Uh oh!

Releases

Packages

Languages

License

PromptBio/pyNullstrap

Folders and files

Latest commit

History

Repository files navigation

pyNullstrap

📦 Installation

Installation from Source

🚀 Quick Start

Basic Usage

Model Examples

Pipeline Integration

📊 Model Support

🔬 Algorithm Overview

Key Steps

Key Features

📚 Documentation

📖 License

📬 Feedback

📄 Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages