fix: gate read-only Graph Drop and flush against on-disk writes#2672
Draft
shivamka1 wants to merge 7 commits into
Draft
fix: gate read-only Graph Drop and flush against on-disk writes#2672shivamka1 wants to merge 7 commits into
shivamka1 wants to merge 7 commits into
Conversation
Contributor
There was a problem hiding this comment.
⚠️ Performance Alert ⚠️
Possible performance regression was detected for benchmark 'Rust Benchmark'.
Benchmark result of this commit is worse than the previous benchmark result exceeding threshold 2.
| Benchmark suite | Current: 6e3c10a | Previous: 9823ef7 | Ratio |
|---|---|---|---|
lotr_graph/num_edges |
4 ns/iter (± 0) |
0 ns/iter (± 0) |
+∞ |
lotr_graph/num_nodes |
5 ns/iter (± 0) |
1 ns/iter (± 0) |
5 |
lotr_graph/graph_latest |
3 ns/iter (± 0) |
0 ns/iter (± 0) |
+∞ |
lotr_graph_materialise/materialize |
7800122 ns/iter (± 38894) |
1564816 ns/iter (± 35303) |
4.98 |
lotr_graph_window_100/num_nodes |
14 ns/iter (± 0) |
5 ns/iter (± 0) |
2.80 |
lotr_graph_window_100_materialise/materialize |
7817239 ns/iter (± 34090) |
1669150 ns/iter (± 10700) |
4.68 |
lotr_graph_window_10/has_node_existing |
136 ns/iter (± 9) |
62 ns/iter (± 11) |
2.19 |
lotr_graph_window_10_materialise/materialize |
3304261 ns/iter (± 16881) |
971980 ns/iter (± 4278) |
3.40 |
lotr_graph_subgraph_10pc_materialise/materialize |
2037283 ns/iter (± 22219) |
334634 ns/iter (± 1287) |
6.09 |
lotr_graph_subgraph_10pc_windowed/has_node_existing |
142 ns/iter (± 10) |
62 ns/iter (± 14) |
2.29 |
lotr_graph_subgraph_10pc_windowed_materialise/materialize |
1251820 ns/iter (± 9826) |
230399 ns/iter (± 2617) |
5.43 |
lotr_graph_window_50_layered/num_edges_temporal |
145011 ns/iter (± 2076) |
70121 ns/iter (± 7586) |
2.07 |
lotr_graph_window_50_layered/has_node_existing |
391 ns/iter (± 21) |
129 ns/iter (± 12) |
3.03 |
lotr_graph_window_50_layered/graph_latest |
83106 ns/iter (± 1880) |
36649 ns/iter (± 916) |
2.27 |
lotr_graph_window_50_layered_materialise/materialize |
28766145 ns/iter (± 68680) |
3488825 ns/iter (± 24948) |
8.25 |
lotr_graph_persistent_window_50_layered/num_edges_temporal |
582198 ns/iter (± 4592) |
192686 ns/iter (± 1569) |
3.02 |
lotr_graph_persistent_window_50_layered/has_node_existing |
426 ns/iter (± 384) |
174 ns/iter (± 83) |
2.45 |
lotr_graph_persistent_window_50_layered/graph_latest |
122169 ns/iter (± 1992) |
57549 ns/iter (± 4809) |
2.12 |
lotr_graph_persistent_window_50_layered_materialise/materialize |
49154804 ns/iter (± 297824) |
5298035 ns/iter (± 147912) |
9.28 |
This comment was automatically generated by workflow using github-action-benchmark.
…htory into fix/readonly-drop-corruption
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Read-only
Graph.load(path, read_only=True)handles were writing to thegraph directory on drop, corrupting a live writer's crash recovery:
Drop for GraphStoreappended a shutdown checkpoint to theshared WAL at LSN 256, clobbering the writer's live records. On next
reopen after a writer crash, recovery failed with
Expected checkpoint at given LSN.Drop for Storagerewrote.metaeven when the reader'ssnapshot was stale, rolling the cached counts backwards.
Fix: add a
read_only: boolfield toGraphStoreandStorage, set byload_read_only, and short-circuit bothDropimpls (plusflush())when the flag is true. Read-only handles are now byte-passive with
respect to the graph directory.
GraphStoregetsload_read_only+ Drop/flush guardsStoragegets a matching flag + Drop guard for the.metarefreshTemporalGraph::load_innerthreadsread_only=truethrough toLayer::load_read_onlyStorageError::ReadOnlyvariant, returned byflush()on aread-only handle
Storage::read_only()also sets the flag so it doesn'tdouble-refresh
.metaalongside the original writerBug #3 (reader
Graph.loadracing writer segment-file creation, ENOENTat
disk_layer_stats/mod.rs:301) is a separate, unrelated race in thereader-open path; it's documented and covered by an
xfailregressiontest in this PR so a future fix will trip the sentinel.
Test plan
New file:
python/tests/test_base_install/test_graphql/test_read_only_durability.pydir, open+drop a read-only handle, snapshot again; assert
byte-identical. Fails on
main, passes on this branch.xfail) — 4 reader threads + live writer; asserts nodisk_layer_stats/mod.rsENOENT. Still fails; xfail tracker forthe follow-up.
(
test_read_only_load.py,test_read_only_in_memory.py) passunchanged.