Model improvements by essharom · Pull Request #2 · stracquadaniolab/gnn-suite

essharom · 2026-02-25T14:21:17Z

No description provided.

Updated task type descriptions and example gene/network file paths in README.

- GraphSAGE: remove activation+dropout after output layer (distorted logits) - GIN: forward() now uses all conv layers dynamically instead of hardcoding 2 - GraphTransformer: fix dimension mismatch for num_layers > 2 - GCNII: fix loop variable shadowing, pass theta through instead of hardcoding None - Evaluate test metric every epoch for accurate best-metric tracking - Replace bare except clauses with except Exception - hyperopt: remove broken sys.path.append, dead code, fix n_jobs default to 1

Copilot

Pull request overview

This pull request introduces comprehensive "Model improvements" that expand the GNN-Suite pipeline to support multiple task types (binary, multiclass, regression), adds MLflow experiment tracking, implements hyperparameter optimization with Optuna, and improves the overall testing and documentation infrastructure.

Changes:

Added support for multiclass classification and regression tasks alongside binary classification
Integrated MLflow for experiment tracking with optional configuration
Enhanced hyperparameter optimization using YAML-based configuration and Optuna
Expanded CI/CD with comprehensive integration tests for all task types
Improved documentation with detailed parameter references and examples

Reviewed changes

Copilot reviewed 7 out of 17 changed files in this pull request and generated 9 comments.

Show a summary per file

File	Description
testdata/sfari_multiclass_genes.csv	New test dataset with 1256 genes for multiclass classification
nextflow.config	Added parameters for task_type, MLflow integration, num_heads, and plotting control
mlflow_server.sh	Script to start MLflow tracking server with SQLite backend
main.nf	Enhanced workflow to support multiple task types and dynamic hyperparameter loading
conf/hyperparams.yaml	Centralized hyperparameter search space configuration
bin/plot.py	Updated to handle task-specific metrics and auto-detect available metrics
bin/models.py	Fixed GIN concatenation logic and improved GCNII implementation
bin/load_hyperparams.py	New script to load optimized hyperparameters or use defaults
bin/hyperopt.py	Refactored with YAML-driven configuration and task type support
bin/gnn.py	Major refactoring to support all three task types with appropriate losses and metrics
README_mlflow.md	Quick start guide for MLflow integration
README.md	Comprehensive documentation updates with parameter tables and examples
.gitignore	Added MLflow-related directories
.github/workflows/ci.yml	Extended with integration tests for all task types and hyperparameter optimization

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-02-25T14:31:16Z

+      - name: Integration Test - Multiclass Classification
+        run: |
+          nextflow run . -profile test,ci,docker \
+            --task-type multiclass \


Parameter name inconsistency: Use --task_type (with underscore) instead of --task-type (with hyphen) to match the workflow parameter definition in nextflow.config.

Copilot · 2026-02-25T14:31:17Z

+
+      - name: Integration Test - Regression
+        run: |
+          nextflow run . -profile test,ci,docker \


Parameter name inconsistency: Use --task_type (with underscore) instead of --task-type (with hyphen) to match the workflow parameter definition in nextflow.config.

Copilot · 2026-02-25T14:31:17Z

+      - name: Integration Test - MLflow Enabled
+        run: |
+          nextflow run . -profile test,ci,docker \
+            --task-type binary \


Parameter name inconsistency: Use --task_type (with underscore) instead of --task-type (with hyphen) to match the workflow parameter definition in nextflow.config.

Copilot · 2026-02-25T14:31:17Z

+  --geneFile testdata/multiclass.csv \
+  --networkFile testdata/multiclass_network.tsv \
+  --task_type multiclass
+
+# Regression
+nextflow run main.nf -profile docker \
+  --geneFile testdata/regression.csv \
+  --networkFile testdata/regression_network.tsv \


Incorrect file paths in documentation: The test data files referenced here don't exist. According to the testdata directory listing, the actual multiclass file is sfari_multiclass_genes.csv, not multiclass.csv. Similarly for the network file and regression files. Update these examples to use the correct file paths that exist in the repository.

Suggested change

--geneFile testdata/multiclass.csv \

--networkFile testdata/multiclass_network.tsv \

--task_type multiclass

# Regression

nextflow run main.nf -profile docker \

--geneFile testdata/regression.csv \

--networkFile testdata/regression_network.tsv \

--geneFile testdata/sfari_multiclass_genes.csv \

--networkFile testdata/sfari_multiclass_network.tsv \

--task_type multiclass

# Regression

nextflow run main.nf -profile docker \

--geneFile testdata/sfari_regression_genes.csv \

--networkFile testdata/sfari_regression_network.tsv \

Copilot · 2026-02-25T14:31:18Z

+  --geneFile testdata/genes.csv \
+  --networkFile testdata/multiclass_network.tsv \
+  --task_type multiclass
+
+# Regression
+nextflow run main.nf -profile docker -entry hyperopt \
+  --geneFile testdata/genes.csv \
+  --networkFile testdata/regression_network.tsv \


Incorrect file paths in documentation: The multiclass and regression examples reference non-existent files. The correct file paths should be testdata/sfari_multiclass_genes.csv with testdata/sfari_multiclass_network.tsv for multiclass, and testdata/sfari_regression_genes.csv with testdata/sfari_regression_network.tsv for regression. Update these examples to match the actual test data files in the repository.

Suggested change

--geneFile testdata/genes.csv \

--networkFile testdata/multiclass_network.tsv \

--task_type multiclass

# Regression

nextflow run main.nf -profile docker -entry hyperopt \

--geneFile testdata/genes.csv \

--networkFile testdata/regression_network.tsv \

--geneFile testdata/sfari_multiclass_genes.csv \

--networkFile testdata/sfari_multiclass_network.tsv \

--task_type multiclass

# Regression

nextflow run main.nf -profile docker -entry hyperopt \

--geneFile testdata/sfari_regression_genes.csv \

--networkFile testdata/sfari_regression_network.tsv \

Copilot · 2026-02-25T14:31:18Z

@@ -0,0 +1,7 @@
+# Start MLflow tracking server


Missing shebang: Bash scripts should start with #!/bin/bash or #!/usr/bin/env bash to ensure proper execution. Add this as the first line of the file.

Copilot · 2026-02-25T14:31:18Z

+      - name: Integration Test - Binary Classification
+        run: |
+          nextflow run . -profile test,ci,docker \
+            --task-type binary \


Parameter name inconsistency: This test uses --task-type (with hyphen) but the workflow definition in nextflow.config defines it as task_type (with underscore). Nextflow parameters should use underscores, not hyphens. Change to --task_type to match the configuration.

Copilot · 2026-02-25T14:31:19Z

+        eval \$(load_hyperparams.py ${model} ${dataSet} ${resultsDir} \
+            ${params.learning_rate} \
+            ${params.weight_decay} \
+            ${params.dropout} \
+            ${params.alpha} \
+            ${params.theta} \
+            ${params.num_heads})
+
+        echo "# Using hyperparameters for ${model}: LR=\${LEARNING_RATE}, WD=\${WEIGHT_DECAY}, DROPOUT=\${DROPOUT}, ALPHA=\${ALPHA}, THETA=\${THETA}, NUM_HEADS=\${NUM_HEADS}" >&2
+
        gnn.py ${geneFile} ${networkFile} \
                --train-size ${params.train_size} \
                --model-name ${model} \
-                --learning-rate ${params.learning_rate} \
-                --weight-decay ${params.weight_decay} \
+                --learning-rate \${LEARNING_RATE} \
+                --weight-decay \${WEIGHT_DECAY} \
                --epochs ${epoch} \
                --eval-threshold ${params.eval_threshold} \
                --verbose-interval ${params.verbose_interval} \
-                --dropout ${params.dropout} \
-                --alpha ${params.alpha} \
-                --theta ${params.theta} \
-                 > full-${model}-${epoch}-run-${run}.txt
-
-        split_data.py full-${model}-${epoch}-run-${run}.txt
+                --dropout \${DROPOUT} \
+                --alpha \${ALPHA} \
+                --theta \${THETA} \
+                --num-heads \${NUM_HEADS} \
+                --task-type ${params.task_type} \
+                 > full-${model}-${epoch}-run-${run}-${params.task_type}.txt


The TrainGNN process builds a shell command using eval $(load_hyperparams.py ${model} ${dataSet} ${resultsDir} ...) with unquoted parameters, which allows command injection if any of model, dataSet, or resultsDir (or other CLI params like learning_rate) contain shell metacharacters such as ; or $(...). An attacker who can control these workflow parameters (e.g., via --dataSet, --models, or a crafted --geneFile basename) can execute arbitrary commands in the task environment, leading to full compromise of the worker node or container. Replace eval with a safer mechanism (e.g., parsing the hyperparameter script output without shell evaluation, or sourcing a strictly validated key=value file) and ensure all interpolated parameters passed to shell commands are safely quoted or avoided by using direct process invocation.

Copilot · 2026-02-25T14:31:19Z

+mlflow server \
+  --backend-store-uri sqlite:///mlflow.db \
+  --default-artifact-root ./mlflow-artifacts \
+  --host 0.0.0.0 \


The MLflow tracking server is started bound to 0.0.0.0 over plain HTTP with no authentication, which exposes experiment metadata and artifacts to anyone on the reachable network. In multi-user or shared environments this can allow unauthorized users to read or tamper with experiment data and models. Restrict the bind address to 127.0.0.1 or a trusted interface, and place MLflow behind authenticated, TLS-terminating infrastructure (or enable MLflow’s security features) before using this script in non-local setups.

Suggested change

mlflow server \

--backend-store-uri sqlite:///mlflow.db \

--default-artifact-root ./mlflow-artifacts \

--host 0.0.0.0 \

MLFLOW_HOST="${MLFLOW_HOST:-127.0.0.1}"

mlflow server \

--backend-store-uri sqlite:///mlflow.db \

--default-artifact-root ./mlflow-artifacts \

--host "${MLFLOW_HOST}" \

essharom and others added 17 commits June 2, 2025 14:33

add preprint link

a14bb12

Add multiclass and regression support

75c037a

Add test datasets for multiclass and regression

a8025e0

Ignore MLflow local tracking files

36e04eb

Add MLflow for experiment tracking

e15f520

Add MLflow model registry functionality

41be912

Add MLflow server script

558d006

Add MLflow model registry and plotting for all task types

74a1119

ci: add linting, integration tests and artifacts

6d3495e

Add hyperparameter optimization with YAML config

ca4461a

Add hyperopt integration tests to CI

751bcbc

Add MLflow quick start guide README

fb103ea

Fix hyperopt workflow

fe3c769

Update README

95e81e3

Updated task type descriptions in README

f560e0c

Updated task type descriptions and example gene/network file paths in README.

Update MLflow usage instructions in README

fa54550

essharom requested a review from Copilot February 25, 2026 14:21

essharom self-assigned this Feb 25, 2026

Copilot started reviewing on behalf of essharom February 25, 2026 14:21 View session

Copilot AI reviewed Feb 25, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Model improvements#2

Model improvements#2
essharom wants to merge 17 commits into
stracquadaniolab:masterfrom
biomedicalinformaticsgroup:model_improvements

essharom commented Feb 25, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Feb 25, 2026

Uh oh!

Copilot AI Feb 25, 2026

Uh oh!

Copilot AI Feb 25, 2026

Uh oh!

Copilot AI Feb 25, 2026

Uh oh!

Copilot AI Feb 25, 2026

Uh oh!

Copilot AI Feb 25, 2026

Uh oh!

Copilot AI Feb 25, 2026

Uh oh!

Copilot AI Feb 25, 2026

Uh oh!

Copilot AI Feb 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

essharom commented Feb 25, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants