Skip to content
Abdullah edited this page Mar 5, 2026 · 32 revisions

Frequently Asked Questions (FAQ)

Common questions and answers about GraphBrew.


General Questions

What is GraphBrew?

GraphBrew is a graph processing benchmark framework that combines:

  • 17 algorithm IDs (0-16): 15 reordering strategies + MAP (file loader) + AdaptiveOrder (ML selector)
  • 8 benchmarks (PageRank, PageRank SpMV, BFS, CC, CC_SV, SSSP, BC, TC)
  • ML-powered algorithm selection via AdaptiveOrder (7 selection modes: perceptron, decision tree, hybrid DT+perceptron, database/kNN, fastest-reorder, best-endtoend, best-amortization)
  • Leiden community detection integration

Who should use GraphBrew?

  • Researchers studying graph algorithms and cache optimization
  • Engineers optimizing graph processing pipelines
  • Students learning about graph algorithms
  • Data scientists working with network data

What makes GraphBrew different?

  1. Comprehensive: 16 reordering algorithms in one framework
  2. ML-powered: AdaptiveOrder learns which algorithm works best
  3. Modern: Leiden community detection integration
  4. Practical: Based on GAP Benchmark Suite standards

Installation Questions

What are the requirements?

  • Linux or macOS
  • GCC 7+ with C++17 support
  • Make
  • Python 3.8+ (optional, for scripts - no pip dependencies required)
  • At least 4GB RAM (more for large graphs)

How do I install on Ubuntu?

sudo apt-get update
sudo apt-get install build-essential git
git clone https://github.com/UVA-LavaLab/GraphBrew.git
cd GraphBrew
make all

How do I install on macOS?

xcode-select --install
brew install gcc
git clone https://github.com/UVA-LavaLab/GraphBrew.git
cd GraphBrew
make all CXX=g++-13

The build fails. What should I check?

  1. GCC version: g++ --version (need 7+)
  2. C++17 support: g++ -std=c++17 --version
  3. OpenMP: echo '#include <omp.h>' | g++ -fopenmp -x c++ - -c

See Installation for detailed troubleshooting.


Usage Questions

How do I run a simple benchmark?

./bench/bin/pr -f graph.el -s -n 3

What does each option mean?

Option Meaning
-f graph.el Input file
-s Make undirected (symmetrize)
-n 3 Run 3 trials
-o 7 Use algorithm 7 (HUBCLUSTERDBG)

Which reordering algorithm should I use?

Situation Recommendation
Don't know -o 14 (AdaptiveOrder)
Social network -o 12 (GraphBrewOrder)
General purpose -o 7 (HUBCLUSTERDBG)
Large graph -o 12 (GraphBrewOrder)
Baseline -o 0 (no reordering)

How do I know which algorithm is best for my graph?

Run multiple algorithms and compare:

for algo in 0 7 12 14 15; do
    echo "=== Algorithm $algo ==="
    ./bench/bin/pr -f graph.el -s -o $algo -n 3
done

Or use AdaptiveOrder (-o 14) to auto-select.

What graph formats are supported?

Format Extension Example
Edge list .el 0 1
Weighted .wel 0 1 2.5
Matrix Market .mtx Standard MTX
DIMACS .gr Road networks

See Supported-Graph-Formats for details.


Performance Questions

Why is reordering slow?

Reordering is preprocessing that pays off over multiple algorithm runs. The reordering step adds upfront cost, but subsequent benchmark runs are faster due to improved cache locality.

For repeated analyses, reorder once, save, reuse.

How much speedup should I expect?

Speedups depend on graph topology, algorithm, and benchmark. High-modularity graphs (social, web) typically benefit more than low-modularity graphs (road networks). Run the full pipeline on your target graphs to measure actual improvements.

Why is my graph loading slowly?

  1. Large file: Use binary format

    ./bench/bin/converter -f graph.el -s -b graph.sg
    ./bench/bin/pr -f graph.sg -n 3
  2. Text parsing: MTX/EL requires parsing; binary is instant

  3. Memory: Ensure sufficient RAM

How can I make benchmarks faster?

  1. Use binary graphs for repeated runs
  2. Tune thread count: export OMP_NUM_THREADS=8
  3. Use NUMA binding: numactl --cpunodebind=0
  4. Reduce trials during development: -n 1

Algorithm Questions

What is Leiden community detection?

Leiden is a community detection algorithm that finds densely connected groups of vertices. It improves on the popular Louvain algorithm with:

  • Guaranteed connected communities
  • Better quality partitions
  • Faster convergence

GraphBrew uses Leiden to guide reordering decisions.

What are RabbitOrder variants?

RabbitOrder (algorithm 8) has two variants:

Variant Command Description
csr -o 8 or -o 8:csr Native CSR implementation (default, recommended)
boost -o 8:boost Original Boost-based implementation (reference only)

The CSR variant includes three correctness fixes over the original CSR code, making it match the Boost reference semantics, plus an auto-adaptive resolution parameter that further improves cache locality.

Resolution parameter: The CSR variant auto-tunes the Louvain resolution γ based on average degree: γ = clamp(14 / avg_degree, 0.5, 1.0). Dense graphs (high avg degree) get a lower γ, which prevents over-merging of communities. Override with the environment variable:

RABBIT_RESOLUTION=0.5 ./bench/bin/pr -f graph.el -s -o 8:csr -n 3

Directed graphs: Both variants read only out-edges and use the same undirected modularity approximation from the original paper. The fixes are valid for both symmetric and directed inputs.

Recommendations:

  • Use csr (default) — faster, no external dependencies, better cache locality
  • Use boost only as a reference baseline for validation
  • The boost variant requires Boost 1.58.0: --install-boost

How does AdaptiveOrder work?

Computes graph features (15 linear + 3 quadratic), uses ML to select best algorithm for the graph with safety checks (OOD guardrail + ORIGINAL margin fallback). Supports 7 selection modes: perceptron (default), decision tree, hybrid DT+perceptron, database/kNN, fastest-reorder, best-endtoend, best-amortization. Default is full-graph mode (single algorithm for entire graph). See AdaptiveOrder-ML.

What is the training pipeline?

4-stage process: multi-restart perceptrons → variant-level weight saving → regret-aware grid search → save. Validate with python3 scripts/graphbrew_experiment.py --eval-weights. See Perceptron-Weights.

Is there a single best algorithm?

GraphBrewOrder is a strong general-purpose choice for most graph types. Recommended: -o 12.

What are the quadratic cross-terms?

w_dv_x_hub (power-law), w_mod_x_logn (large modular graphs), w_pf_x_wsr (uniform+cache). See AdaptiveOrder-ML#features-used.

Why are some perceptron weights 0?

Cache simulation was skipped, features weren't computed, or no benchmark data. Fix: --train --size small (full pipeline). See Perceptron-Weights#troubleshooting.

How do I validate trained weights?

python3 scripts/graphbrew_experiment.py --eval-weights  # Simulates C++ scoring, reports accuracy/regret

See Python-Scripts#-eval_weightspy---weight-evaluation--c-scoring-simulation.

Where are the trained weights saved?

All trained models (perceptron weights, decision trees, hybrid parameters) are stored in results/data/adaptive_models.json. The C++ runtime loads perceptron weights from this file via LoadPerceptronWeightsFromDB(). If the file is missing, hardcoded defaults are used. See Perceptron-Weights#weight-file-location.

What's the difference between LeidenOrder and GraphBrewOrder?

  • LeidenOrder (15): Baseline reference using GVE-Leiden external library (requires CSR→DiGraph conversion)
  • GraphBrewOrder (12): Production Leiden + per-community reordering (e.g., RabbitOrder within each community), CSR-native, best quality

GraphBrewOrder uses Leiden community detection natively on CSR, then applies configurable per-community ordering for the best cache locality.

When should I use DBG vs HUBCLUSTER?

Algorithm Best For
DBG Power-law graphs with clear hot/cold separation
HUBCLUSTER Graphs where hubs connect to each other
HUBCLUSTERDBG Combines both - good general choice

Troubleshooting

See Troubleshooting for detailed solutions. Quick answers:

  • Segfault: Check file exists, format correct (vertices start at 0), sufficient RAM
  • Results vary: Normal — use -n 10, disable frequency scaling, use numactl
  • No speedup: Not all graphs benefit — try AdaptiveOrder or different algorithm
  • Python issues: Need Python 3.8+ (no pip required for core scripts)

Development Questions

How do I add a new reordering algorithm?

See Adding-New-Algorithms for a complete guide:

  1. Add enum value in reorder_types.h
  2. Implement reorder function
  3. Add switch case
  4. (Optional) Add perceptron weights

How do I add a new benchmark?

See Adding-New-Benchmarks for a complete guide:

  1. Create bench/src/my_algo.cc
  2. Implement algorithm
  3. Add to Makefile
  4. Test

How do I contribute?

  1. Fork the repository
  2. Create a feature branch
  3. Make changes with tests
  4. Submit a pull request

See CONTRIBUTING.md for guidelines.

What's the code license?

GraphBrew is released under the MIT License. See LICENSE.


Data Questions

Where can I download graphs?

Source URL Formats
SNAP snap.stanford.edu/data Edge list
SuiteSparse sparse.tamu.edu MTX
Network Repository networkrepository.com Various
KONECT konect.cc Various

How do I convert my graph format?

# CSV to edge list
cat graph.csv | tr ',' ' ' | tail -n +2 > graph.el

# MTX to edge list (1-indexed to 0-indexed)
grep -v "^%" graph.mtx | tail -n +2 | awk '{print $1-1, $2-1}' > graph.el

What's the maximum graph size?

Limited by RAM:

  • ~16 bytes per edge
  • 1B edges ≈ 16GB RAM
  • Larger graphs need out-of-core processing (not supported)

Still Have Questions?

  • Check Home for documentation overview
  • Review Troubleshooting for common issues
  • Open a GitHub issue for bugs
  • Start a discussion for questions

← Back to Home

Clone this wiki locally