FAQ

Frequently Asked Questions (FAQ)

Common questions and answers about GraphBrew.

General Questions

What is GraphBrew?

GraphBrew is a graph processing benchmark framework that combines:

17 algorithm IDs (0-16): 15 reordering strategies + MAP (file loader) + AdaptiveOrder (ML selector)
8 benchmarks (PageRank, PageRank SpMV, BFS, CC, CC_SV, SSSP, BC, TC)
ML-powered algorithm selection via AdaptiveOrder (7 selection modes: perceptron, decision tree, hybrid DT+perceptron, database/kNN, fastest-reorder, best-endtoend, best-amortization)
Leiden community detection integration

Who should use GraphBrew?

Researchers studying graph algorithms and cache optimization
Engineers optimizing graph processing pipelines
Students learning about graph algorithms
Data scientists working with network data

What makes GraphBrew different?

Comprehensive: 16 reordering algorithms in one framework
ML-powered: AdaptiveOrder learns which algorithm works best
Modern: Leiden community detection integration
Practical: Based on GAP Benchmark Suite standards

Installation Questions

What are the requirements?

Linux or macOS
GCC 7+ with C++17 support
Make
Python 3.8+ (optional, for scripts - no pip dependencies required)
At least 4GB RAM (more for large graphs)

How do I install on Ubuntu?

sudo apt-get update
sudo apt-get install build-essential git
git clone https://github.com/UVA-LavaLab/GraphBrew.git
cd GraphBrew
make all

How do I install on macOS?

xcode-select --install
brew install gcc
git clone https://github.com/UVA-LavaLab/GraphBrew.git
cd GraphBrew
make all CXX=g++-13

The build fails. What should I check?

GCC version: g++ --version (need 7+)
C++17 support: g++ -std=c++17 --version
OpenMP: echo '#include <omp.h>' | g++ -fopenmp -x c++ - -c

See Installation for detailed troubleshooting.

Usage Questions

How do I run a simple benchmark?

./bench/bin/pr -f graph.el -s -n 3

What does each option mean?

Option	Meaning
`-f graph.el`	Input file
`-s`	Make undirected (symmetrize)
`-n 3`	Run 3 trials
`-o 7`	Use algorithm 7 (HUBCLUSTERDBG)

Which reordering algorithm should I use?

Situation	Recommendation
Don't know	`-o 14` (AdaptiveOrder)
Social network	`-o 12` (GraphBrewOrder)
General purpose	`-o 7` (HUBCLUSTERDBG)
Large graph	`-o 12` (GraphBrewOrder)
Baseline	`-o 0` (no reordering)

How do I know which algorithm is best for my graph?

Run multiple algorithms and compare:

for algo in 0 7 12 14 15; do
    echo "=== Algorithm $algo ==="
    ./bench/bin/pr -f graph.el -s -o $algo -n 3
done

Or use AdaptiveOrder (-o 14) to auto-select.

What graph formats are supported?

Format	Extension	Example
Edge list	`.el`	`0 1`
Weighted	`.wel`	`0 1 2.5`
Matrix Market	`.mtx`	Standard MTX
DIMACS	`.gr`	Road networks

See Supported-Graph-Formats for details.

Performance Questions

Why is reordering slow?

Reordering is preprocessing that pays off over multiple algorithm runs. The reordering step adds upfront cost, but subsequent benchmark runs are faster due to improved cache locality.

For repeated analyses, reorder once, save, reuse.

How much speedup should I expect?

Speedups depend on graph topology, algorithm, and benchmark. High-modularity graphs (social, web) typically benefit more than low-modularity graphs (road networks). Run the full pipeline on your target graphs to measure actual improvements.

Why is my graph loading slowly?

Large file: Use binary format

./bench/bin/converter -f graph.el -s -b graph.sg
./bench/bin/pr -f graph.sg -n 3

Text parsing: MTX/EL requires parsing; binary is instant
Memory: Ensure sufficient RAM

How can I make benchmarks faster?

Use binary graphs for repeated runs
Tune thread count: export OMP_NUM_THREADS=8
Use NUMA binding: numactl --cpunodebind=0
Reduce trials during development: -n 1

Algorithm Questions

What is Leiden community detection?

Leiden is a community detection algorithm that finds densely connected groups of vertices. It improves on the popular Louvain algorithm with:

Guaranteed connected communities
Better quality partitions
Faster convergence

GraphBrew uses Leiden to guide reordering decisions.

What are RabbitOrder variants?

RabbitOrder (algorithm 8) has two variants:

Variant	Command	Description
`csr`	`-o 8` or `-o 8:csr`	Native CSR implementation (default, recommended)
`boost`	`-o 8:boost`	Original Boost-based implementation (reference only)

The CSR variant includes three correctness fixes over the original CSR code, making it match the Boost reference semantics, plus an auto-adaptive resolution parameter that further improves cache locality.

Resolution parameter: The CSR variant auto-tunes the Louvain resolution γ based on average degree: γ = clamp(14 / avg_degree, 0.5, 1.0). Dense graphs (high avg degree) get a lower γ, which prevents over-merging of communities. Override with the environment variable:

RABBIT_RESOLUTION=0.5 ./bench/bin/pr -f graph.el -s -o 8:csr -n 3

Directed graphs: Both variants read only out-edges and use the same undirected modularity approximation from the original paper. The fixes are valid for both symmetric and directed inputs.

Recommendations:

Use csr (default) — faster, no external dependencies, better cache locality
Use boost only as a reference baseline for validation
The boost variant requires Boost 1.58.0: --install-boost

How does AdaptiveOrder work?

Computes graph features (15 linear + 3 quadratic), uses ML to select best algorithm for the graph with safety checks (OOD guardrail + ORIGINAL margin fallback). Supports 7 selection modes: perceptron (default), decision tree, hybrid DT+perceptron, database/kNN, fastest-reorder, best-endtoend, best-amortization. Default is full-graph mode (single algorithm for entire graph). See AdaptiveOrder-ML.

What is the training pipeline?

4-stage process: multi-restart perceptrons → variant-level weight saving → regret-aware grid search → save. Validate with python3 scripts/graphbrew_experiment.py --eval-weights. See Perceptron-Weights.

Is there a single best algorithm?

GraphBrewOrder is a strong general-purpose choice for most graph types. Recommended: -o 12.

What are the quadratic cross-terms?

w_dv_x_hub (power-law), w_mod_x_logn (large modular graphs), w_pf_x_wsr (uniform+cache). See AdaptiveOrder-ML#features-used.

Why are some perceptron weights 0?

Cache simulation was skipped, features weren't computed, or no benchmark data. Fix: --train --size small (full pipeline). See Perceptron-Weights#troubleshooting.

How do I validate trained weights?

python3 scripts/graphbrew_experiment.py --eval-weights  # Simulates C++ scoring, reports accuracy/regret

See Python-Scripts#-eval_weightspy---weight-evaluation--c-scoring-simulation.

Where are the trained weights saved?

All trained models (perceptron weights, decision trees, hybrid parameters) are stored in results/data/adaptive_models.json. The C++ runtime loads perceptron weights from this file via LoadPerceptronWeightsFromDB(). If the file is missing, hardcoded defaults are used. See Perceptron-Weights#weight-file-location.

What's the difference between LeidenOrder and GraphBrewOrder?

LeidenOrder (15): Baseline reference using GVE-Leiden external library (requires CSR→DiGraph conversion)
GraphBrewOrder (12): Production Leiden + per-community reordering (e.g., RabbitOrder within each community), CSR-native, best quality

GraphBrewOrder uses Leiden community detection natively on CSR, then applies configurable per-community ordering for the best cache locality.

When should I use DBG vs HUBCLUSTER?

Algorithm	Best For
DBG	Power-law graphs with clear hot/cold separation
HUBCLUSTER	Graphs where hubs connect to each other
HUBCLUSTERDBG	Combines both - good general choice

Troubleshooting

See Troubleshooting for detailed solutions. Quick answers:

Segfault: Check file exists, format correct (vertices start at 0), sufficient RAM
Results vary: Normal — use -n 10, disable frequency scaling, use numactl
No speedup: Not all graphs benefit — try AdaptiveOrder or different algorithm
Python issues: Need Python 3.8+ (no pip required for core scripts)

Development Questions

How do I add a new reordering algorithm?

See Adding-New-Algorithms for a complete guide:

Add enum value in reorder_types.h
Implement reorder function
Add switch case
(Optional) Add perceptron weights

How do I add a new benchmark?

See Adding-New-Benchmarks for a complete guide:

Create bench/src/my_algo.cc
Implement algorithm
Add to Makefile
Test

How do I contribute?

Fork the repository
Create a feature branch
Make changes with tests
Submit a pull request

See CONTRIBUTING.md for guidelines.

What's the code license?

GraphBrew is released under the MIT License. See LICENSE.

Data Questions

Where can I download graphs?

Source	URL	Formats
SNAP	snap.stanford.edu/data	Edge list
SuiteSparse	sparse.tamu.edu	MTX
Network Repository	networkrepository.com	Various
KONECT	konect.cc	Various

How do I convert my graph format?

# CSV to edge list
cat graph.csv | tr ',' ' ' | tail -n +2 > graph.el

# MTX to edge list (1-indexed to 0-indexed)
grep -v "^%" graph.mtx | tail -n +2 | awk '{print $1-1, $2-1}' > graph.el

What's the maximum graph size?

Limited by RAM:

~16 bytes per edge
1B edges ≈ 16GB RAM
Larger graphs need out-of-core processing (not supported)

Still Have Questions?

Check Home for documentation overview
Review Troubleshooting for common issues
Open a GitHub issue for bugs
Start a discussion for questions

← Back to Home

FAQ

Frequently Asked Questions (FAQ)

General Questions

What is GraphBrew?

Who should use GraphBrew?

What makes GraphBrew different?

Installation Questions

What are the requirements?

How do I install on Ubuntu?

How do I install on macOS?

The build fails. What should I check?

Usage Questions

How do I run a simple benchmark?

What does each option mean?

Which reordering algorithm should I use?

How do I know which algorithm is best for my graph?

What graph formats are supported?

Performance Questions

Why is reordering slow?

How much speedup should I expect?

Why is my graph loading slowly?

How can I make benchmarks faster?

Algorithm Questions

What is Leiden community detection?

What are RabbitOrder variants?

How does AdaptiveOrder work?

What is the training pipeline?

Is there a single best algorithm?

What are the quadratic cross-terms?

Why are some perceptron weights 0?

How do I validate trained weights?

Where are the trained weights saved?

What's the difference between LeidenOrder and GraphBrewOrder?

When should I use DBG vs HUBCLUSTER?

Troubleshooting

Development Questions

How do I add a new reordering algorithm?

How do I add a new benchmark?

How do I contribute?

What's the code license?

Data Questions

Where can I download graphs?

How do I convert my graph format?

What's the maximum graph size?

Still Have Questions?

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally