Skip to content

fplaunchpad/dag_node_equivalence

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DAG Node Equivalence Checker

An OCaml prototype that determines whether two nodes in an append-only DAG are equivalent — meaning the subgraphs reachable from them (both forward via successors and backward via predecessors) are isomorphic, preserving node values.

Node values are polymorphic — the DAG is parameterised over a VALUE module via a functor.

Prerequisites

  • OCaml >= 5.0
  • dune >= 3.0
  • Graphviz (for PNG rendering; brew install graphviz on macOS)

Quick start

dune build
dune exec test/test_dag.exe       # correctness tests
dune exec bench/bench_dag.exe     # benchmarks

How it works

Data model

The DAG is append-only. Each node has:

  • A value of type V.t (chosen by the caller via the VALUE module).
  • A list of successors (edges to existing, lower-id nodes), fixed at creation.
  • A list of predecessors (accumulated as new nodes point to it).

Two Merkle hashes per node

Every node carries two hashes:

Hash Captures Computed when
hash_fwd Value + forward-reachable subgraph (via successors) Once, at creation
hash_bwd Value + backward-reachable subgraph (via predecessors) Recomputed on every insertion that affects the node

Both are computed the same way — H(V.hash(value), sort(neighbor_hashes)) — just over different neighbor sets (successors for forward, predecessors for backward).

Equivalence query

equiv(u, v) = (hash_fwd(u) = hash_fwd(v)) && (hash_bwd(u) = hash_bwd(v))

This is O(1).

Deduplication

A node is only inserted if no existing node has the same (hash_fwd, hash_bwd) pair with matching value and successor ids. This means structurally identical nodes are never duplicated, which eliminates the diamond-vs-tree ambiguity that would otherwise make simple Merkle hashing unsound.

Backward hash propagation

When a new node v is added with edges v -> w1, ..., wk:

  1. v becomes a new predecessor of each wi.
  2. hash_bwd(wi) must be recomputed (its predecessor set changed).
  3. Changing hash_bwd(wi) affects any node that has wi as a predecessor — those are wi's successors (lower-id nodes that wi points to).
  4. Propagation follows successor edges (forward in the DAG, toward leaves).
  5. Nodes are processed in descending id order, which is a valid topological order (predecessors always have higher ids, so they're processed first).
   v  (new)            Propagation direction
   |                          |
   v                          v
  w1 ──> x ──> y ──> z      (toward leaves)

Each node is visited at most once per insertion. When we process node n, all its predecessors (higher ids) are already finalized, so the hash is computed from stable inputs. Propagation stops early if recomputation yields the same hash.

Hash function

63-bit FNV-1a. The neighbor hash list is sorted (multiset semantics) and the list length is mixed in so that [], [0], and [0; 0] all hash differently. Node values are hashed via the user-supplied V.hash function (Hashtbl.hash works for any type with structural equality).

Project layout

lib/
  hash.mli / hash.ml       Hash combiner (Hash.hash_node)
  dag.mli  / dag.ml        Core DAG functor, VALUE sig, Int_value default
  dag_viz.mli / dag_viz.ml Text table and Graphviz output (also a functor)
test/
  test_dag.ml              Correctness tests (assert-based)
bench/
  bench_dag.ml             Insertion throughput, propagation cost, query latency

API overview

The VALUE signature

To use the DAG with a custom value type, provide a module matching:

module type VALUE = sig
  type t
  val hash : t -> int       (* Hashtbl.hash works for structural equality *)
  val equal : t -> t -> bool
  val to_string : t -> string
end

Dag.Int_value is provided as a convenience for integer-valued DAGs.

Instantiation

open Subgraph_iso

(* Integer DAG — using the built-in Int_value *)
module G = Dag.Make (Dag.Int_value)
module Viz = Dag_viz.Make (G)

(* Custom value type *)
module My_value = struct
  type t = { label : string; weight : float }
  let hash v = Hashtbl.hash v
  let equal a b = a.label = b.label && a.weight = b.weight
  let to_string v = Printf.sprintf "%s(%.1f)" v.label v.weight
end

module My_dag = Dag.Make (My_value)
module My_viz = Dag_viz.Make (My_dag)

Building a DAG

let g = G.create () in

(* Leaves first, then nodes that point to them *)
let c = G.add_node g ~value:3 ~successors:[] in
let b = G.add_node g ~value:2 ~successors:[c] in
let a = G.add_node g ~value:1 ~successors:[b] in
(* a -> b -> c *)

Querying equivalence

(* Build a second identical chain *)
let f = G.add_node g ~value:3 ~successors:[] in
let e = G.add_node g ~value:2 ~successors:[f] in
let d = G.add_node g ~value:1 ~successors:[e] in

(* Corresponding nodes are equivalent *)
assert (G.equiv g a d);   (* roots *)
assert (G.equiv g b e);   (* middles *)
assert (G.equiv g c f);   (* leaves *)

Inspecting a node

G.value g a              (* 1 *)
G.successors g a         (* [b] *)
G.predecessors g a       (* [] *)
G.hash_fwd g a           (* some int *)
G.hash_bwd g a           (* some int *)

Visualisation

(* Text table to stdout *)
print_string (Viz.to_string g);

(* Graphviz DOT source *)
print_string (Viz.to_dot g);

(* Render directly to PNG (requires graphviz) *)
Viz.render_png g "my_dag.png";

Performance characteristics

Measured on Apple Silicon, OCaml 5.4.0, native code:

Operation Complexity 10k nodes
add_node (wide fan-out) O(1) amortised 9 ms total
add_node (chain, worst case) O(n) per insert, O(n^2) total 13 s total
Single-insert propagation (chain) O(n) 2.7 ms
equiv query O(1) ~100 ns

The quadratic chain case is inherent: each insertion into a chain propagates backward hashes through all existing nodes. For the target scale (thousands of nodes, mixed topologies), this is acceptable.

Extending

  • Wider hashes: swap Hash.hash_node to return int * int (128-bit) for near-zero collision probability.
  • Lazy propagation: defer backward hash recomputation until equiv is queried (trade insertion speed for query latency).
  • Persistence: the DAG is currently mutable. Wrap it in a functional interface if you need snapshot/rollback.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors