Skip to content

Gh 3902 new in-memory graph GraphMemIndexedSet#3903

Draft
arne-bdt wants to merge 3 commits intoapache:mainfrom
arne-bdt:GH-3902-GraphMemIndexedSet
Draft

Gh 3902 new in-memory graph GraphMemIndexedSet#3903
arne-bdt wants to merge 3 commits intoapache:mainfrom
arne-bdt:GH-3902-GraphMemIndexedSet

Conversation

@arne-bdt
Copy link
Copy Markdown
Contributor

@arne-bdt arne-bdt commented May 6, 2026

GitHub issue resolved #3902

Pull request Description:

GaphMemIndexedSet is based on the architecture of GraphMemRoaring, but replaces the RoaringBitmaps by simple index-lists and a reverse index.

Benefits;

  • memory footprint is comparable to GraphMemFast for smaller graphs up to 1M triples. For larger graphs (BSBM 25M and 50M) the footprint is even smaller.
  • Graph#add speed is comparable to GraphMemFast (depending on the graph)
  • Graph#delete speed is faster than GraphMemFast, especially for large graphs
  • Graph#find / #stream:
    • S__, _P_, O__ --> slightly slower than GraphMemFast
    • SPO --> faster than GraphMemFast
    • ___ --> faster than GraphMemFast
    • SP_, S_O, _PO --> faster than GraphMemFast in most cases
  • Graph#contains:
    • SP_, S_O, _PO --> dependent on insert order --> the only non-optimal thing I discovered
    • other match pattern behave like #find
  • GraphMem#copy --> faster than GraphMemFast
  • supporting the same indexing strategies as GraphMemRoaring

GraphMemRoaring could be deprecated, due to worse performance in all discriplines.


  • Tests are included.
  • Commits have been squashed to remove intermediate development commit messages.
  • Key commit messages start with the issue number (GH-xxxx)

By submitting this pull request, I acknowledge that I am making a contribution to the Apache Software Foundation under the terms and conditions of the Contributor's Agreement.


See the Apache Jena "Contributing" guide.

Minor bugs fixed:
- FastArrayBunch does not hold references to orphans after #tryRemove and #removeUnchecked
- FastHashMap#computeIfAbsent does not creat an invalid state if call of absentValueSupplier fails
- FastHashMap#compute now grows on insert like all other insert operations of this map

Improvements:
- iterators and spliterators don't need a runnable for concurrency check any more. They check the size directly, which has been shown to be faster.
- FastHashBase#fillPositionsArray now iterates over the dense keys array, which should be faster in most cases.
- Removed FastHashSet#IndexedKey and the corresponding iterator and spliterator implementations
--> introduced FastHashBase#forEachKey and #forEachKeyParallel instead
- added JavaDoc
- added tests
Fixing jena-benckmarks:
- updated pom.xml files to version "6.2.0-SNAPSHOT"
- fixed spliterator benchmarks to support new Sized parameter
GaphMemIndexedSet is based on the architecture of GraphMemRoaring, but replaces the RoaringBitmaps by simple index-lists and a reverse index.

Benefits:
- memory footprint is comparable to GraphMemFast for smaller graphs up to 1M triples. For larger graphs (BSBM 25M and 50M) the footprint is even smaller.
- Graph#add speed is comparable to GraphMemFast (depending on the graph)
- Graph#delete speed is faster than GraphMemFast, especially for large graphs
- Graph#find / #stream:
  - S__, _P_, O__ --> slightly slower than GraphMemFast
  - SPO --> faster than GraphMemFast
  - ___ --> faster than GraphMemFast
  - SP_, S_O, _PO --> faster than GraphMemFast in most cases
- Graph#contains:
  - SP_, S_O, _PO --> dependent on insert order --> the only non-optimal thing I discovered
  - other match pattern behave like #find
- GraphMem#copy --> faster than GraphMemFast
- supporting the same indexing strategies as GraphMemRoaring
@arne-bdt arne-bdt changed the title Gh 3902 graph mem indexed set Gh 3902 new in-memory graph GraphMemIndexedSet May 6, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

New GraphMemIndexedSet

2 participants