An OCaml prototype that determines whether two nodes in an append-only DAG are equivalent — meaning the subgraphs reachable from them (both forward via successors and backward via predecessors) are isomorphic, preserving node values.
Node values are polymorphic — the DAG is parameterised over a VALUE module
via a functor.
- OCaml >= 5.0
- dune >= 3.0
- Graphviz (for PNG rendering;
brew install graphvizon macOS)
dune build
dune exec test/test_dag.exe # correctness tests
dune exec bench/bench_dag.exe # benchmarks
The DAG is append-only. Each node has:
- A value of type
V.t(chosen by the caller via theVALUEmodule). - A list of successors (edges to existing, lower-id nodes), fixed at creation.
- A list of predecessors (accumulated as new nodes point to it).
Every node carries two hashes:
| Hash | Captures | Computed when |
|---|---|---|
hash_fwd |
Value + forward-reachable subgraph (via successors) | Once, at creation |
hash_bwd |
Value + backward-reachable subgraph (via predecessors) | Recomputed on every insertion that affects the node |
Both are computed the same way — H(V.hash(value), sort(neighbor_hashes)) —
just over different neighbor sets (successors for forward, predecessors for
backward).
equiv(u, v) = (hash_fwd(u) = hash_fwd(v)) && (hash_bwd(u) = hash_bwd(v))This is O(1).
A node is only inserted if no existing node has the same (hash_fwd, hash_bwd)
pair with matching value and successor ids. This means structurally identical
nodes are never duplicated, which eliminates the diamond-vs-tree ambiguity that
would otherwise make simple Merkle hashing unsound.
When a new node v is added with edges v -> w1, ..., wk:
vbecomes a new predecessor of eachwi.hash_bwd(wi)must be recomputed (its predecessor set changed).- Changing
hash_bwd(wi)affects any node that haswias a predecessor — those arewi's successors (lower-id nodes thatwipoints to). - Propagation follows successor edges (forward in the DAG, toward leaves).
- Nodes are processed in descending id order, which is a valid topological order (predecessors always have higher ids, so they're processed first).
v (new) Propagation direction
| |
v v
w1 ──> x ──> y ──> z (toward leaves)
Each node is visited at most once per insertion. When we process node n, all
its predecessors (higher ids) are already finalized, so the hash is computed
from stable inputs. Propagation stops early if recomputation yields the same
hash.
63-bit FNV-1a. The neighbor hash list is sorted (multiset semantics) and the
list length is mixed in so that [], [0], and [0; 0] all hash differently.
Node values are hashed via the user-supplied V.hash function (Hashtbl.hash
works for any type with structural equality).
lib/
hash.mli / hash.ml Hash combiner (Hash.hash_node)
dag.mli / dag.ml Core DAG functor, VALUE sig, Int_value default
dag_viz.mli / dag_viz.ml Text table and Graphviz output (also a functor)
test/
test_dag.ml Correctness tests (assert-based)
bench/
bench_dag.ml Insertion throughput, propagation cost, query latency
To use the DAG with a custom value type, provide a module matching:
module type VALUE = sig
type t
val hash : t -> int (* Hashtbl.hash works for structural equality *)
val equal : t -> t -> bool
val to_string : t -> string
endDag.Int_value is provided as a convenience for integer-valued DAGs.
open Subgraph_iso
(* Integer DAG — using the built-in Int_value *)
module G = Dag.Make (Dag.Int_value)
module Viz = Dag_viz.Make (G)
(* Custom value type *)
module My_value = struct
type t = { label : string; weight : float }
let hash v = Hashtbl.hash v
let equal a b = a.label = b.label && a.weight = b.weight
let to_string v = Printf.sprintf "%s(%.1f)" v.label v.weight
end
module My_dag = Dag.Make (My_value)
module My_viz = Dag_viz.Make (My_dag)let g = G.create () in
(* Leaves first, then nodes that point to them *)
let c = G.add_node g ~value:3 ~successors:[] in
let b = G.add_node g ~value:2 ~successors:[c] in
let a = G.add_node g ~value:1 ~successors:[b] in
(* a -> b -> c *)(* Build a second identical chain *)
let f = G.add_node g ~value:3 ~successors:[] in
let e = G.add_node g ~value:2 ~successors:[f] in
let d = G.add_node g ~value:1 ~successors:[e] in
(* Corresponding nodes are equivalent *)
assert (G.equiv g a d); (* roots *)
assert (G.equiv g b e); (* middles *)
assert (G.equiv g c f); (* leaves *)G.value g a (* 1 *)
G.successors g a (* [b] *)
G.predecessors g a (* [] *)
G.hash_fwd g a (* some int *)
G.hash_bwd g a (* some int *)(* Text table to stdout *)
print_string (Viz.to_string g);
(* Graphviz DOT source *)
print_string (Viz.to_dot g);
(* Render directly to PNG (requires graphviz) *)
Viz.render_png g "my_dag.png";Measured on Apple Silicon, OCaml 5.4.0, native code:
| Operation | Complexity | 10k nodes |
|---|---|---|
add_node (wide fan-out) |
O(1) amortised | 9 ms total |
add_node (chain, worst case) |
O(n) per insert, O(n^2) total | 13 s total |
| Single-insert propagation (chain) | O(n) | 2.7 ms |
equiv query |
O(1) | ~100 ns |
The quadratic chain case is inherent: each insertion into a chain propagates backward hashes through all existing nodes. For the target scale (thousands of nodes, mixed topologies), this is acceptable.
- Wider hashes: swap
Hash.hash_nodeto returnint * int(128-bit) for near-zero collision probability. - Lazy propagation: defer backward hash recomputation until
equivis queried (trade insertion speed for query latency). - Persistence: the DAG is currently mutable. Wrap it in a functional interface if you need snapshot/rollback.