Skip to content

Commit ee6d520

Browse files
Add a graph search with state & custom path state problem
1 parent 01dfc57 commit ee6d520

File tree

8 files changed

+793
-2
lines changed

8 files changed

+793
-2
lines changed

README.md

Lines changed: 34 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -319,6 +319,9 @@ If you wish to perform a path search such as the above, but without reporting pr
319319
use the `Qtil::GraphPathSearch` module instead, which provides an efficient search algorithm
320320
without producing a `@kind path-problem` query.
321321

322+
For a custom path problem with stateful flow tracking (see `GraphPathStateSearch` for more info),
323+
use `CustomPathStateProblem`.
324+
322325
### Inheritance
323326

324327
**Instance**: A module to make `instanceof` inheritance easier in CodeQL, by writing
@@ -433,6 +436,37 @@ This module takes a set of starting points, ending points, and edges in a graph,
433436

434437
For displaying the discovered paths to users, see the `CustomPathProblem` module above.
435438

439+
**GraphPathStateSearch**: An expansion of the above module that allows for tracking state from
440+
start to end of a path, with potential transformation of that state on each edge.
441+
442+
For example, this can be used to set a maximum search depth or to find cycles in a graph (such as
443+
recursive functions).
444+
445+
```
446+
class RecursiveFunctionSearch implements Qtil::GraphPathStateSearchSig<Function> {
447+
// Our state is, conceptually, just the function we started the search from. However, we must
448+
// distinguish end nodes based on whether at least one step was taken to reach them. If we don't,
449+
// then all functions will have a flow path (of zero length) to themselves.
450+
newtype State = TNoStepTaken(Function f) or TStepsTaken(Function f);
451+
452+
// Start nodes haven't yet taken a step, so the state is NoStepTaken
453+
predicate start(Function f, State state) { state = TNoStepTaken(f) }
454+
455+
// End nodes are functions that reach themselves after at least one step.
456+
predicate end(Function f, State state) { state = TStepsTaken(f) }
457+
458+
predicate edge(Function f0, State s0, Function f1, State s1) {
459+
// Connect each functions to the functions that they call.
460+
f0.calls(f1) and
461+
exists(Function initial |
462+
// Forward the function along the edge, noting that at least one step has been taken.
463+
(s0 = TNoStepTaken(initial) or s0 = TStepsTaken(initial)) and
464+
s1 = TStepsTaken(initial)
465+
)
466+
}
467+
}
468+
```
469+
436470
### Testing with Qnit
437471

438472
While codeql's `test run` subcommand is a great way to test queries, it can be better in some cases

src/qtil/graph/GraphPathSearch.qll

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -26,6 +26,8 @@ private import qtil.parameterization.Finalize
2626
* predicate end(Node n1) { ... }
2727
* }
2828
* ```
29+
*
30+
* To track state as well as flow, use `GraphPathStateSearchSig` instead.
2931
*/
3032
signature module GraphPathSearchSig<FiniteType Node> {
3133
/**
@@ -106,6 +108,8 @@ signature module GraphPathSearchSig<FiniteType Node> {
106108
* - `ReverseNode`: All forward nodes that reach end nodes.
107109
*
108110
* These classes may be useful in addition to the `hasPath` predicate.
111+
*
112+
* To track state as well as flow, use `GraphPathStateSearch` instead.
109113
*/
110114
module GraphPathSearch<FiniteType Node, GraphPathSearchSig<Node> Config> {
111115
/**
Lines changed: 284 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,284 @@
1+
/**
2+
* Like `GraphPathSearch`, this file defines a module for efficiently finding paths in a directional
3+
* graph using a performant pattern called forward-reverse pruning.
4+
*
5+
* Additionally, this module is designed to track state through the paths it is looking for. For
6+
* instance, we could use this graph to find recursive functions, which requires knowing how an end
7+
* node was reached from a start node (the state).
8+
*
9+
* Like `GraphPathSearch`, this module uses forward-reverse pruning, wihch is a pattern that is
10+
* useful for efficiently finding connections between nodes in a directional graph. In a first pass,
11+
* it finds nodes reachable from the starting point. In the second pass, it finds the subset of
12+
* those nodes that can be reached from the end point. Together, these create a path from start
13+
* points to end points.
14+
*
15+
* As with the other performance patterns in qtil, this module may be useful as is, or it may not
16+
* fit your needs exactly. CodeQL evaluation and performance is very complex. In that case, consider
17+
* this pattern as an example to create your own solution that fits your needs.
18+
*/
19+
20+
private import qtil.parameterization.SignatureTypes
21+
private import qtil.parameterization.Finalize
22+
23+
/**
24+
* Implement this signature to define a graph, and a search for paths within that graph tracking
25+
* some state, using the `GraphPathStateSearch` module.
26+
*
27+
* ```ql
28+
* module MyConfig implements GraphPathStateSearchSig<Node> {
29+
* class State extends ... { ... };
30+
* predicate start(Node n1) { ... }
31+
* predicate edge(Node n1, Node n2) { ... }
32+
* predicate end(Node n1) { ... }
33+
* }
34+
* ```
35+
*
36+
* To flow without state, use `GraphPathSearchSig` instead.
37+
*/
38+
signature module GraphPathStateSearchSig<FiniteType Node> {
39+
/**
40+
* The state to be tracked through the paths found by this module.
41+
*
42+
* For example, if searching for recursive functions, this class might be defined as:
43+
*
44+
* ```ql
45+
* class State = Function;
46+
* ```
47+
*
48+
* The `edges` predicate defined in this signature module decides how to forward this state, so
49+
* the state may change as the path is traversed.
50+
*/
51+
bindingset[this]
52+
class State;
53+
54+
/**
55+
* The nodes that begin the search of the graph, and the starting state for those nodes.
56+
*
57+
* For instance, if searching for recursive functions, this predicate might hold for a Function
58+
* and its state may be the Function itself.
59+
*
60+
* Ultimately, only paths from a start node to an end node will be found by this module.
61+
*
62+
* In most cases, this will ideally be a smaller set of nodes than the end nodes. However, if the
63+
* graph branches in one direction more than the other, a larger set which branches less may be
64+
* preferable.
65+
*
66+
* The design of this predicate has a great effect in how well this performance pattern will
67+
* ultimately perform.
68+
*/
69+
predicate start(Node n1, State s1);
70+
71+
/**
72+
* A directional edge from `n1` to `n2`, and the state that is forwarded from `n1` to `n2`.
73+
*
74+
* This module will search for paths from `start` to `end` by looking following the direction of
75+
* these edges.
76+
*
77+
* As an example state transformation, a maximum search depth could be tracked at each edge and
78+
* the new state would be the old state with the depth incremented by one. Alternatively, if
79+
* searching for recursive functions, the state could be the starting function, and this edge
80+
* relation would forward that function unchanged.
81+
*
82+
* The design of this predicate has a great effect in how well this performance pattern will
83+
* ultimately perform.
84+
*/
85+
bindingset[s1]
86+
bindingset[s2]
87+
predicate edge(Node n1, State s1, Node n2, State s2);
88+
89+
/**
90+
* The end nodes of the search, if reached with the given state.
91+
*
92+
* For instance, if searching for recursive functions, this predicate would likely hold when a
93+
* function node is reached with the state being same function declaration (indicating flow from
94+
* the start function to itself).
95+
*
96+
* Ultimately, only paths from a start node to an end node will be found by this module.
97+
*
98+
* The design of this predicate has a great effect in how well this performance pattern will
99+
* ultimately perform.
100+
*/
101+
bindingset[s1]
102+
predicate end(Node n1, State s1);
103+
}
104+
105+
/**
106+
* A module that implements an efficient search for a path that satisfies specified stateful
107+
* constraints within a custom directional graph from a set of start nodes to a set of end nodes.
108+
*
109+
* For example, this module can be used to detect loops in the graph (perhaps to find recursive
110+
* functions) by setting the "state" to be the start node, forwarding that state unchanged on each
111+
* edge, and considering a node to be an end node if it is reached with itself as the state.
112+
* Alternatively, the state could be used to track a maximum search depth, with a start state of
113+
* zero that is incremented at each edge, and where the edge relation does not hold beyond a certain
114+
* depth.
115+
*
116+
* To show discovered paths to users, see the module `CustomPathStateProblem` which uses this module
117+
* as * its underlying search implementation.
118+
*
119+
* This module uses a pattern called "forward reverse pruning" for efficiency. This pattern is
120+
* useful for reducing the search space when looking for paths in a directional graph. In a first
121+
* pass, it finds nodes reachable from the starting point. In the second pass, it finds the subset
122+
* of those nodes that can be reached from the end point. Together, these create a path from start
123+
* points to end points.
124+
*
125+
* To use this module, provide an implementation of the `GraphPathSearchSig` signature as follows:
126+
*
127+
* ```ql
128+
* module Config implements GraphPathSearchSig<Person> {
129+
* class State extends Something { ... };
130+
* predicate start(Person p, State s) { p.checkSomething() and s = p.getSomeStartValue() }
131+
* predicate edge(Person p1, State s1, Person p2, State s2) { p2 = p1.getAParent() and s2 = s1.next() }
132+
* predicate end(Person p, State s) { p.checkSomethingElse() and s.isValidEndState() }
133+
* }
134+
* ```
135+
*
136+
* The design of these predicate has a great effect in how well this performance pattern will
137+
* ultimately perform.
138+
*
139+
* The resulting predicate `hasPath` should be a much more efficient search of connected start nodes
140+
* to end nodes than a naive search (which in CodeQL could easily be evaluated as either a full
141+
* graph search, or a search over the cross product of all nodes).
142+
*
143+
* ```ql
144+
* from Person p1, State s1, Person p2, State s2
145+
* // Fast graph path detection thanks to forward-reverse pruning.
146+
* where GraphPathStateSearch<Person, Config>::hasPath(p1, s1, p2, p2)
147+
* select p1, s1, p2, p2
148+
* ```
149+
*
150+
* The resulting module also exposes two predicates:
151+
* - `ForwardNode`: All nodes reachable from the start nodes, with member predicate `getState()`.
152+
* - `ReverseNode`: All forward nodes that reach end nodes, with member predicate `getState()`.
153+
*
154+
* These classes may be useful in addition to the `hasPath` predicate.
155+
*
156+
* To track state as well as flow, use `GraphPathStateSearch` instead.
157+
*/
158+
module GraphPathStateSearch<FiniteType Node, GraphPathStateSearchSig<Node> Config> {
159+
/**
160+
* The set of all nodes reachable from the start nodes (inclusive).
161+
*
162+
* Includes the member predicate `getState()` which returns the state associated with this node at
163+
* this point in the search.
164+
*/
165+
class ForwardNode extends Final<Node>::Type {
166+
Config::State state;
167+
168+
ForwardNode() { forwardNode(this, state) }
169+
170+
/**
171+
* Get the state associated with this forward node at this point in the search.
172+
*/
173+
Config::State getState() { result = state }
174+
175+
string toString() { result = "ForwardNode" }
176+
}
177+
178+
/**
179+
* The performant predicate for looking forward one step at a time in the graph.
180+
*
181+
* In `GraphPathSearch`, this is fast because it is essentially a unary predicate. The same is
182+
* true here when the correct joins occur, such that (n, s) effectively act as a single value.
183+
*
184+
* For this reason, we use `pragma[only_bind_into]` to ensure the correct join order.
185+
*/
186+
private predicate forwardNode(Node n, Config::State s) {
187+
Config::start(pragma[only_bind_into](n), pragma[only_bind_into](s))
188+
or
189+
exists(Node n0, Config::State s0 |
190+
forwardNode(pragma[only_bind_into](n0), pragma[only_bind_into](s0)) and
191+
Config::edge(n0, s0, n, s)
192+
)
193+
}
194+
195+
/**
196+
* The set of all forward nodes that reach end nodes (inclusive).
197+
*
198+
* Includes the member predicate `getState()` which returns the state associated with this node at
199+
* this point in the search.
200+
*
201+
* These nodes are the nodes that exist along the path from start nodes to end nodes.
202+
*
203+
* Note: this is fast to compute because it is essentially a unary predicate.
204+
*/
205+
class ReverseNode extends ForwardNode {
206+
ReverseNode() {
207+
// 'state' field and getState() predicate are inherited from ForwardNode
208+
reverseNode(this, state)
209+
}
210+
211+
override string toString() { result = "ReverseNode" }
212+
}
213+
214+
private predicate reverseNode(Node n, Config::State s) {
215+
forwardNode(pragma[only_bind_into](n), pragma[only_bind_into](s)) and
216+
Config::end(n, s)
217+
or
218+
exists(Node n0, Config::State s0 |
219+
reverseNode(pragma[only_bind_into](n0), pragma[only_bind_into](s0)) and
220+
Config::edge(n, s, n0, s0)
221+
)
222+
}
223+
224+
/**
225+
* A start node, end node pair that are connected in the graph.
226+
*/
227+
predicate hasConnection(ReverseNode n1, ReverseNode n2) { hasConnection(n1, _, n2, _) }
228+
229+
/**
230+
* A start node, end node pair that are connected in the graph, and the states associated with
231+
* those nodes.
232+
*/
233+
predicate hasConnection(ReverseNode n1, Config::State s1, ReverseNode n2, Config::State s2) {
234+
Config::start(n1, s1) and
235+
Config::end(n2, s2) and
236+
(
237+
hasPath(n1, s1, n2, s2)
238+
or
239+
n1 = n2 and s1 = s2
240+
)
241+
}
242+
243+
/**
244+
* All relevant edges in the graph which participate in a connection from a start to an end node.
245+
*/
246+
predicate pathEdge(ReverseNode n1, ReverseNode n2) { pathEdge(n1, _, n2, _) }
247+
248+
/**
249+
* All relevant edges in the graph, plus state, which participate in a connection from a start to
250+
* an end node.
251+
*/
252+
predicate pathEdge(ReverseNode n1, Config::State s1, ReverseNode n2, Config::State s2) {
253+
Config::edge(n1, s1, n2, s2) and
254+
reverseNode(pragma[only_bind_into](n2), pragma[only_bind_into](s2))
255+
}
256+
257+
/**
258+
* A performant path search within a custom directed graph from a set of start nodes to a set of
259+
* end nodes.
260+
*
261+
* This predicate is the main entry point for the forward-reverse pruning pattern. The design of
262+
* the config predicates has a great effect in how well this performance pattern will ultimately
263+
* perform.
264+
*
265+
* Example:
266+
* ```ql
267+
* from Person p1, Person p2
268+
* where GraphPathSearch<Person, Config>::hasPath(p1, p2)
269+
* select p1, p2
270+
* ```
271+
*
272+
* Note: this is fast to compute because limits the search space to nodes found by the fast unary
273+
* searches done to find `ForwardNode` and `ReverseNode`.
274+
*/
275+
predicate hasPath(ReverseNode n1, Config::State s1, ReverseNode n2, Config::State s2) {
276+
Config::start(n1, s1) and
277+
Config::edge(n1, s1, n2, s2)
278+
or
279+
exists(ReverseNode nMid, Config::State sMid |
280+
hasPath(n1, s1, nMid, sMid) and
281+
Config::edge(pragma[only_bind_out](nMid), pragma[only_bind_out](sMid), n2, s2)
282+
)
283+
}
284+
}

0 commit comments

Comments
 (0)