-
Notifications
You must be signed in to change notification settings - Fork 2
Python Scripts
Documentation for all Python tools in the GraphBrew framework.
The scripts folder contains a single entry point, a modular library (lib/),
and a VLDB experiment suite (experiments/):
scripts/
βββ graphbrew_experiment.py # β MAIN: Single entry point for all experiments
βββ requirements.txt # Python dependencies
β
βββ experiments/ # π VLDB 2026 paper experiment suite
β βββ __init__.py
β βββ vldb_paper_experiments.py # Self-contained runner (8 experiments, auto-setup)
β βββ vldb_generate_figures.py # PNG figures & LaTeX table generation from JSON data
β βββ vldb_config.py # Shared config (graphs, algorithms, parameters)
β βββ vldb_experiments.py # Supplementary experiment scripts
β βββ vldb_experiments_small.py # Small-scale experiments
β βββ exp3_model_ablation.py # Model ablation experiment
β
βββ lib/ # π¦ Modular library (5 sub-packages)
β βββ __init__.py # Re-exports every public name (backward-compatible)
β βββ README.md # Package map & import conventions
β β
β βββ core/ # Constants, logging, data stores
β β βββ utils.py # SSOT β algorithm IDs, variant registry, paths, logging
β β βββ graph_types.py # Data classes (GraphInfo, BenchmarkResult, etc.)
β β βββ datastore.py # BenchmarkStore, GraphPropsStore (append-only JSON DBs)
β β βββ graph_data.py # Graph metadata & dataset catalog
β β
β βββ pipeline/ # Experiment execution stages
β β βββ dependencies.py # System dependency detection & installation
β β βββ build.py # Binary compilation utilities
β β βββ download.py # Graph downloading from SuiteSparse
β β βββ reorder.py # Vertex reordering generation
β β βββ benchmark.py # Performance benchmark execution
β β βββ cache.py # Cache simulation analysis
β β βββ suitesparse_catalog.py # SuiteSparse auto-discovery (ssgetpy)
β β βββ progress.py # Progress tracking & reporting
β β
β βββ ml/ # ML scoring & training (legacy / fallback)
β β βββ weights.py # SSO scoring β PerceptronWeight.compute_score()
β β βββ eval_weights.py # Weight evaluation & data loading
β β βββ training.py # Iterative / batched perceptron training
β β βββ adaptive_emulator.py # C++ AdaptiveOrder logic emulation
β β βββ oracle.py # Oracle (best-possible) analysis
β β βββ features.py # Graph topology feature extraction
β β
β βββ analysis/ # Post-run analysis & visualisation
β β βββ adaptive.py # A/B testing, Leiden variant comparison
β β βββ metrics.py # Amortization & end-to-end metrics
β β βββ figures.py # SVG / PNG plot generation
β β
β βββ tools/ # Standalone CLI utilities
β βββ check_includes.py # C++ header-include linting
β βββ regen_features.py # Feature-vector regeneration
β
βββ test/ # Pytest suite
β βββ __init__.py
β βββ test_algorithm_variants.py
β βββ test_cache_simulation.py
β βββ test_experiment_validation.py
β βββ test_fill_adaptive.py
β βββ test_fill_weights_variants.py
β βββ test_graphbrew_experiment.py
β βββ test_multilayer_validity.py
β βββ test_self_recording.py
β βββ test_weight_flow.py
β βββ data/ # Test data fixtures
β βββ graphs/ # Test graph fixtures
β
βββ README.md
Two self-contained experiment runners for two papers:
See the VLDB-Experiments wiki page for full usage.
| Module | Purpose |
|--------|βββββββββ|
| vldb_paper_experiments.py | Main runner: 8 experiments, auto-build/download/convert, output parsers |
| vldb_generate_figures.py | LaTeX table & PNG figure generation from experiment JSON data |
| vldb_config.py | Shared configuration: graph list, algorithm IDs, variant names |
| Module | Purpose |
|--------|βββββββββ|
| ecg_paper_experiments.py | 11 experiments in 2 sections: A1-A3 accuracy validation (GRASP/P-OPT faithfulness, ECG mode equivalence), B1-B8 performance showcase (policy comparison, reorder effect, cache sweep, ECG mode comparison, fat-ID analysis) |
| ecg_config.py | Configuration: 9 policies, 3 ECG modes, 10 accuracy pairs, 11 reorderΓpolicy pairs, 4 reorder variants, 6 eval graphs, cache sweep sizes |
Section A β Accuracy validation with PASS/FAIL reports against paper claims:
- A1: GRASP invariants (Faldu et al., HPCA'20)
- A2: P-OPT invariants (Balaji et al., HPCA'21)
- A3: ECG mode equivalence (DBG_ONLYβGRASP, POPT_PRIMARYβP-OPT)
Section B β Performance showcase:
- B1: Policy comparison, B2: Reorder effect, B3: ReorderΓpolicy interaction
- B4: Cache size sweep, B5: Algorithm analysis, B6: Graph sensitivity
- B7: ECG mode comparison, B8: Fat-ID bit allocation
The main script provides orchestration over the lib/ modules. It handles argument parsing and calls the appropriate phase functions.
# Full pipeline: download β build β experiment β weights
python3 scripts/graphbrew_experiment.py --full --size small
# See all options
python3 scripts/graphbrew_experiment.py --help| Feature | Description |
|---|---|
| Graph Download | Downloads from SuiteSparse collection (up to ~466 graphs via auto-discovery) |
| Auto Build | Compiles binaries if missing |
| Memory Management | Automatically skips graphs exceeding RAM limits |
| Label Maps | Pre-generates reordering maps for consistency |
| Reordering | Tests all 17 algorithm IDs (0-16), with variants (RCM:bnf, RabbitOrder:csr/boost, GOrder:csr/fast, GoGraphOrder:default/fast/naive, GraphBrewOrder:leiden/rabbit/hubcluster/hrab/tqr/hcache/streaming) |
| Benchmarks | PR, PR_SPMV, BFS, CC, CC_SV, SSSP, BC, TC |
| Cache Simulation | L1/L2/L3 hit rate analysis |
| Brute-Force Validation | Compares adaptive vs all algorithms |
Pure Python emulator that replicates C++ AdaptiveOrder logic without recompiling.
This is useful for:
- Analyzing how weight changes affect algorithm selection
- Testing weight configurations quickly in Python
- Understanding the two-layer selection process (type matching + perceptron)
- Debugging why a specific algorithm was chosen
# Via entry point
python3 scripts/graphbrew_experiment.py --emulator
# Or directly via module (supports full argparse)
python3 -m scripts.lib.ml.adaptive_emulator --graph graphs/email-Enron/email-Enron.mtx
python3 -m scripts.lib.ml.adaptive_emulator --compare-benchmark results/benchmark_*.json
python3 -m scripts.lib.ml.adaptive_emulator --all-graphs --disable-weight w_modularity
python3 -m scripts.lib.ml.adaptive_emulator --mode best-endtoend --compare-benchmark results/benchmark.json| Mode | Description |
|---|---|
fastest-reorder |
Minimize reordering time only |
fastest-execution |
Minimize algorithm execution time (default) |
best-endtoend |
Minimize (reorder_time + execution_time) |
best-amortization |
Minimize iterations needed to amortize reordering cost |
heuristic |
Feature-based heuristic (more robust) |
type-bench |
Type+benchmark recommendations (best accuracy) |
The emulator replicates the C++ AdaptiveOrder two-layer selection:
Layer 1: Type Matching
- Compute graph features β normalized vector
- Find closest type centroid (Euclidean distance)
- OOD check: if distance > 1.5 β return ORIGINAL
- Load that type's weights
Layer 2: Algorithm Selection
- Compute perceptron scores for each algorithm
- Score = bias + Ξ£(weight_i Γ feature_i)
+ quadratic cross-terms (dvΓhub, modΓlogN, pfΓwsr)
+ convergence bonus (PR/SSSP only)
- ORIGINAL margin check: if best - ORIGINAL < 0.05 β ORIGINAL
- Select algorithm with highest score
| Tool | Purpose |
|---|---|
--emulator / scripts.lib.ml.adaptive_emulator
|
Emulate C++ selection logic, analyze weight impact |
--eval-weights / scripts.lib.ml.eval_weights
|
Evaluate weights, train from benchmark data |
Use adaptive_emulator when you want to understand why a specific algorithm was selected.
Use eval_weights (via --eval-weights) when you want to train or evaluate weights.
Quick evaluation script that trains weights, simulates C++ scoreBase() Γ benchmarkMultiplier() scoring, and reports accuracy/regret metrics.
This is the fastest way to validate that your trained weights actually produce good algorithm selections without recompiling C++ or running live benchmarks.
python3 scripts/graphbrew_experiment.py --eval-weights
python3 scripts/graphbrew_experiment.py --eval-weights --sg-only-
Loads the latest
benchmark_*.jsonandreorder_*.jsonfromresults/ -
Trains weights via
compute_weights_from_results()(multi-restart perceptrons + regret-aware grid search) -
Saves weights to
results/data/adaptive_models.json -
Simulates C++ scoring: for each (graph, benchmark), computes
scoreBase(algo, features) Γ benchmarkMultiplier(algo, bench)for all algorithms - Compares predicted winner vs actual fastest algorithm (base-aware: variants of same algorithm count as correct)
- Reports accuracy, regret, top-2 accuracy, and per-benchmark breakdown
| Metric | Description |
|---|---|
| Accuracy | % of (graph, benchmark) pairs where predicted base algorithm matches actual best |
| Top-2 accuracy | % where prediction is in the top 2 fastest algorithms |
| Avg regret | Average (predicted_time β best_time) / best_time across all predictions |
| Median regret | Median of the above (more robust to outliers) |
| Base-aware regret | Same as regret, but variant mismatches within the same base = 0% |
| Per-benchmark accuracy | Breakdown by pr, pr_spmv, bfs, cc, cc_sv, sssp, bc, tc |
=== Simulating C++ adaptive selection ===
Overall accuracy: XX/YY = ZZ.Z%
Unique predicted algorithms: N: [...]
Per-benchmark accuracy:
bfs: .../... = ...%
cc: .../... = ...%
pr: .../... = ...%
sssp: .../... = ...%
bc: .../... = ...%
tc: .../... = ...%
Average regret: X.X% (lower is better)
Top-2 accuracy: .../... = ...%
Median regret: X.X%
Base-aware avg regret: X.X% (variant mismatches = 0%)
Base-aware median regret: X.X%
Scoring uses a single source of truth β the canonical PerceptronWeight.compute_score() method in weights.py. No duplicated formula:
def _simulate_score(algo_data, feats, benchmark='pr'):
"""Simulate C++ scoreBase() Γ benchmarkMultiplier().
Delegates to PerceptronWeight.compute_score() β SSO scoring.
Covers all 17 features + 3 quadratic terms + convergence bonus
+ cache constants + benchmark multiplier.
"""
pw = PerceptronWeight.from_dict(algo_data)
return pw.compute_score(feats, benchmark)This ensures Python evaluation matches C++ scoreBase() Γ benchmarkMultiplier() exactly, with no risk of divergence from a separately maintained formula.
| Tool | Purpose | Updates Weights? |
|---|---|---|
eval_weights.py |
Train + evaluate + report accuracy/regret | β Yes |
adaptive_emulator.py |
Emulate C++ selection logic for debugging | β No |
See Command-Line-Reference for the complete CLI option reference.
Key flags: --full (complete pipeline), --size small|medium|large, --phase reorder|benchmark|cache|weights|adaptive, --train (multi-phase weight training), --train-benchmarks (default: pr bfs cc), --train-iterative (feedback loop), --brute-force (validation), --auto (auto-detect RAM/disk limits).
Variant testing: --all-variants, --graphbrew-variants, --rabbit-variants, --gorder-variants. Variant lists defined in scripts/lib/core/utils.py.
python3 scripts/graphbrew_experiment.py --full --size small
python3 scripts/graphbrew_experiment.py --train --size small --max-graphs 5
python3 scripts/graphbrew_experiment.py --brute-force --validation-benchmark pr
python3 scripts/graphbrew_experiment.py --generate-maps --size smallThe lib/ folder is organised into five sub-packages. All public names are re-exported from lib/__init__.py for backward compatibility.
| Module | Purpose | Key Exports |
|---|---|---|
utils.py |
SSOT constants & utilities |
ALGORITHMS, BENCHMARKS, BenchmarkResult, variant lists, canonical_algo_key(), run_command
|
graph_types.py |
Core types |
GraphInfo, AlgorithmConfig
|
datastore.py |
Unified data store |
BenchmarkStore, GraphPropsStore, adaptive_models.json
|
graph_data.py |
Per-graph storage |
GraphDataStore, list_graphs, list_runs
|
| Module | Purpose | Key Exports |
|---|---|---|
dependencies.py |
System deps |
check_dependencies, install_dependencies
|
build.py |
C++ build |
build_benchmarks, build_converter
|
download.py |
Graph download |
download_graphs, DOWNLOAD_GRAPHS_SMALL/MEDIUM
|
reorder.py |
Vertex reordering |
generate_reorderings, ReorderResult, load_label_maps_index
|
benchmark.py |
Benchmarking |
run_benchmark, run_benchmarks_multi_graph, run_benchmarks_with_variants
|
cache.py |
Cache simulation |
run_cache_simulations, CacheResult, get_cache_stats_summary
|
phases.py |
(removed) | Orchestration is handled directly by graphbrew_experiment.py
|
progress.py |
Progress tracking |
ProgressTracker (banners, phases, status) |
| Module | Purpose | Key Exports |
|---|---|---|
weights.py |
SSO scoring (fallback) |
PerceptronWeight, compute_weights_from_results, cross_validate_logo
|
eval_weights.py |
Data loading |
load_all_results, build_performance_matrix, compute_graph_features
|
training.py |
ML training |
train_adaptive_weights_iterative, train_adaptive_weights_large_scale
|
adaptive_emulator.py |
C++ emulation | AdaptiveOrderEmulator |
oracle.py |
Oracle analysis |
compute_oracle_accuracy, compute_regret
|
features.py |
Graph features |
compute_extended_features, detect_graph_type, get_available_memory_gb
|
| Module | Purpose | Key Exports |
|---|---|---|
adaptive.py |
Adaptive analysis, A/B testing |
analyze_adaptive_order, parse_adaptive_output
|
metrics.py |
Amortization & E2E |
compute_amortization, format_amortization_table, AmortizationReport
|
figures.py |
Plot generation | SVG / PNG figures for wiki |
| Module | Purpose |
|---|---|
check_includes.py |
Scan C++ headers for legacy includes |
regen_features.py |
Regenerate graph features via C++ binary |
The primary training function: multi-restart perceptrons (5Γ800 epochs, z-score normalized, L2 regularized) β variant-level weight saving β regret-aware grid search for benchmark multipliers β stages to type_0.json (merged into adaptive_models.json by export_unified_models()). See Perceptron-Weights for details.
Post-hoc analysis of existing benchmark results β no new benchmarks needed.
Computes derived metrics from benchmark_*.json and reorder_*.json:
- Amortization iterations β how many kernel runs to break even on reorder cost
- E2E speedup at N iterations β speedup including amortized reorder cost
- Head-to-head variant comparison β crossover points between two algorithms
from scripts.lib.analysis.metrics import compute_amortization_report, format_amortization_table
from scripts.lib.core.datastore import BenchmarkStore
store = BenchmarkStore()
benchmark_results = store.load_benchmark_results("results/benchmark_latest.json")
reorder_results = store.load_reorder_results("results/reorder_latest.json")
report = compute_amortization_report(benchmark_results, reorder_results)
print(format_amortization_table(report))iters_to_amortize = reorder_cost / (baseline_time - reordered_time)
| Verdict | Amort Iters | Meaning |
|---|---|---|
| INSTANT | < 1 | Reorder cost negligible |
| FAST | 1β10 | Pays off quickly |
| OK | 10β100 | Worth it for repeated use |
| SLOW | > 100 | Only for many iterations |
| NEVER | β | Kernel is slower, never pays off |
| Column | Description |
|---|---|
| Kernel | Kernel-only speedup vs ORIGINAL |
| Reorder | Time spent computing the reordering |
| Amort | Iterations needed to break even |
| E2E@1 | End-to-end speedup at 1 iteration |
| E2E@10 | End-to-end speedup at 10 iterations |
| E2E@100 | End-to-end speedup at 100 iterations |
| Verdict | Human-readable break-even summary |
The --compare ALGO_A ALGO_B flag produces a per-graph table showing:
- Which variant has faster kernels
- Which wins at E2E@1, @10, @100
- The crossover iteration where the slower-to-reorder variant overtakes
- Overall win/loss counts
The amortization report is also auto-printed by graphbrew_experiment.py after Phase 2 (benchmark) completes.
The primary entry point is graphbrew_experiment.py, which orchestrates all pipeline
phases directly. Use --target-graphs N for one-command operation:
# Full pipeline: download 100 graphs per size, benchmark, train, evaluate
python3 scripts/graphbrew_experiment.py --target-graphs 100
# Preview what would run (no execution)
python3 scripts/graphbrew_experiment.py --target-graphs 100 --dry-run
# Run individual phases
python3 scripts/graphbrew_experiment.py --phase reorder --size small
python3 scripts/graphbrew_experiment.py --phase benchmark --size smallSee --help for all 90+ flags, organized into argument groups (Pipeline, Download,
Resources, Graph Selection, Algorithms, Training, Validation, Tools, etc.).
Results are organized as:
-
results/graphs/{name}/features.jsonβ static graph features -
results/logs/{name}/runs/{timestamp}/β per-run benchmarks, reorder, weights -
results/mappings/{name}/β pre-generated label mappings (.lo+.time) -
results/benchmark_*.json,reorder_*.json,cache_*.jsonβ aggregate results -
results/data/adaptive_models.jsonβ trained model weights for C++
Use python3 -m scripts.lib.core.graph_data --list-graphs to browse per-graph data.
cd scripts
pip install -r requirements.txt# Core dependencies - standard library only for basic benchmarking
# These are REQUIRED for training, evaluation, and analysis scripts:
pytest>=7.0
networkx>=2.6
numpy>=1.20.0
# Optional: For extended analysis and visualization (uncomment if needed)
# pandas>=1.3.0 # For data manipulation
# matplotlib>=3.4.0 # For plotting results
# scipy>=1.7.0 # For correlation analysis
pip install -r scripts/requirements.txt
python3 --version # Should be 3.8+make all
make sim # For cache simulationchmod +x bench/bin/*
chmod +x bench/bin_sim/*- AdaptiveOrder-ML - ML perceptron details
- Running-Benchmarks - Command-line usage
- Code-Architecture - Codebase structure