ModGenePlexus

A module-based, network-ML approach for gene classification of long, noisy, heterogeneous gene lists from GWAS and transcriptomics studies

Paper: McKim A, Mancuso CA, Krishnan A. A module-based approach for post-omics, post-GWAS network-based gene classification. bioRxiv (2025). https://doi.org/10.1101/2025.08.11.669721

Complete archive (code + data + figures + results): https://doi.org/10.5281/zenodo.19857910

What is ModGenePlexus?

Complex traits and diseases involve hundreds to thousands of genes that span multiple biological processes, making them poor candidates for single-model network-based gene classification methods like GenePlexus. The core problem: GWAS and transcriptomic gene lists are large, noisy (false positives and negatives from low power and biological heterogeneity), and functionally heterogeneous (genes spread across multiple disconnected neighborhoods in the genome-scale network).

ModGenePlexus addresses this with a divide-and-conquer strategy:

Module discovery — Input genes (e.g., DEGs from RNA-seq, MAGMA-prioritized GWAS genes) are clustered into topologically coherent network modules using DOMINO, which simultaneously denoises the list by dropping weakly connected genes and expanding each module via semi-supervised label propagation.
Module-specific classification — A supervised GenePlexus classifier (logistic regression on STRING network features) is trained independently for each module, producing genome-wide gene rankings per module.
Score aggregation — Module-level predictions are combined using the tau score to produce a final ranked gene list across all modules.

Benchmarked across simulated traits (combined GOBP gene sets), 1,517 transcriptomic gene lists from CREEDS (diseases, gene perturbations, drug treatments), and 691 GWAS-derived gene lists from GWAS Atlas, ModGenePlexus consistently and significantly outperforms GenePlexus — with the performance advantage growing as gene sets become larger and more heterogeneous. Beyond improved classification, ModGenePlexus reveals more granular and interpretable biological processes: in a Type 2 Diabetes case study, it recovered 366 enriched GO Biological Process terms compared to 52 from GenePlexus, including copper ion transport and iron homeostasis pathways that the single-model approach missed entirely.

Repository contents

This repository is a snapshot of the analysis code used to produce the results in the associated paper, with emphasis on the study-bias holdout validation framework. For the full archive including large result files, see Zenodo.

ModGenePlexus/
├── pygeneplexus/        # GenePlexus Python code used within the ModGenePlexus workflow
├── src/                 # Main ModGenePlexus analysis scripts
│   └── diabetes/        # Type 2 diabetes case-study scripts
├── figures/             # Figure assembly scripts and selected figure outputs
└── tsne/                # Network embedding visualization scripts and selected outputs

The complete Zenodo archive additionally contains all intermediate and final result files, processed data, and supplementary figure outputs that are too large for GitHub.

Reproducing the paper results

All analyses in the paper can be reproduced using the scripts in src/. The core evaluation pipeline follows these steps:

Network and gene set processing — Build the STRING v10 network (threshold: edge weight > 0.7; 16,624 nodes, 400,729 edges) and compile gene set collections (GOBP for simulations; CREEDS and GWAS Atlas for real-world validation). See src/ and File S1 in the paper for processing details.
Module discovery — Run DOMINO on each input gene list to generate network modules.
Study-bias holdout evaluation — Genes in the top two-thirds of PubMed mention frequency serve as training positives; understudied genes (bottom third) are held out as the test set. See Methods for full details on negative gene selection via PyGenePlexus.
Model training and aggregation — Train GenePlexus classifiers per module; aggregate with the tau score.
Enrichment analysis — Run GOBP enrichment (clusterProfiler) on top-ranked predictions from each module and from GenePlexus; compare information content and term specificity.

Note: Some scripts assume the original file paths and compute environment from the study. The Zenodo archive contains all input data and result files needed to run the scripts without re-downloading or re-processing primary data.

Dependencies

ModGenePlexus builds on:

PyGenePlexus (v1.0.1) — network-based gene classification
DOMINO — active module identification
scikit-learn — logistic regression with L2 regularization
clusterProfiler (R) — GOBP enrichment analysis
ComplexHeatmap + circlize (R) — coefficient visualization

Citation

If you use this code or data, please cite both the paper and the archive:

@article{mckim2025modgeneplexus,
  title   = {A module-based approach for post-omics, post-{GWAS} network-based gene classification},
  author  = {McKim, Alexander and Mancuso, Christopher A. and Krishnan, Arjun},
  journal = {bioRxiv},
  year    = {2025},
  doi     = {10.1101/2025.08.11.669721}
}

Zenodo archive: McKim A, Mancuso CA, Krishnan A. ModGenePlexus (code + data + results). Zenodo. https://doi.org/10.5281/zenodo.19857910

License

BSD 3-Clause License. See LICENSE.

Contact

Questions about the code or method: open a GitHub Issue. For broader questions about the Krishnan Lab's work on network-based gene classification and data reuse, visit thekrishnanlab.org.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ModGenePlexus

What is ModGenePlexus?

Repository contents

Reproducing the paper results

Dependencies

Citation

License

Contact

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
data		data
pygeneplexus		pygeneplexus
src		src
LICENSE		LICENSE
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

ModGenePlexus

What is ModGenePlexus?

Repository contents

Reproducing the paper results

Dependencies

Citation

License

Contact

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages