Skip to content

Add gtars-node npm publish pipeline#255

Open
nsheff wants to merge 20 commits into
devfrom
feat/gtars-node-publish
Open

Add gtars-node npm publish pipeline#255
nsheff wants to merge 20 commits into
devfrom
feat/gtars-node-publish

Conversation

@nsheff
Copy link
Copy Markdown
Member

@nsheff nsheff commented Apr 13, 2026

Summary

  • Rename gtars-napi/gtars-node/ to match npm package name @databio/gtars-node
  • Add build-node and publish-node jobs to rust-publish.yml
  • Add build-node-bindings.yml CI workflow
  • Add gtars-node to workspace members
  • Add optionalDependencies, repository, license, files fields to package.json
  • Bump actions/checkout to v6 across all jobs
  • Add seqcol compare method and lazy load_sequence/load_collection for remote store support
  • Add pagination to list_collections
  • Add tests for getSequence, getCollectionMetadata, and compare

Published as @databio/gtars-node@0.0.2 on npm.

Test plan

  • Local build and 9/9 tests passing
  • CI build succeeds on linux-x64 and darwin-arm64
  • npm publish pipeline tested end-to-end
  • Packages live: @databio/gtars-node

nsheff added 20 commits April 13, 2026 14:45
- Rename directory gtars-napi/ → gtars-node/ to match npm package name
- Add build-node and publish-node jobs to rust-publish.yml
- Add build-node-bindings.yml CI workflow
- Set version to 0.0.1 for initial npm publish
- Configure napi triples for linux-x64-gnu and darwin-arm64 only
- Add optionalDependencies, repository, license, files to package.json
- Bump all actions/checkout to v6
- Remove unnecessary npm install -g npm@latest from wasm job
Cargo requires gtars-node/ to be either in workspace.members or
workspace.exclude since it lives inside the workspace root. Adding
to members since it uses workspace dependencies (anyhow).
- Add compare(digestA, digestB) method returning comparison as JSON string
- Fix list_collections to use new paginated API (page, page_size, filters)
- Fix stats() to match current StoreStats (was stats_extended)
- Remove total_disk_size from StoreStatsJs (no longer in StoreStats)
- Add serde_json dependency for comparison serialization
- Clean up unused mut warnings
napi prepublish generates index.js/index.d.ts but doesn't always copy
the .node binaries into the npm/ platform subdirectories. Copy them
explicitly after prepublish.
- Add load_sequence/load_collection calls for remote store support
- Add pagination params to list_collections (page, page_size)
- Bump version to 0.0.2
- Add README
OIDC provenance gets 404 on npm publish for this org. Use the
NPM_TOKEN secret instead. Add --ignore-scripts to main package
publish to prevent prepublishOnly from re-publishing platform
packages.
Adapts the streaming sequence decoder (originally on
feat/streaming-decoder) to the split store module structure on
feat/gtars-node-publish. The stream_sequence method is added to
ReadonlyRefgetStore (and inherited by RefgetStore via Deref), producing
refget sequence bytes incrementally without materializing the full
sequence in memory. Needed as the Rust backing for the upcoming
gtars-node streamSequence binding.
Exposes RefgetStore::stream_sequence through napi as a push-based
ThreadsafeFunction pipeline (streamSequenceInternal), wrapped on the JS
side by a thin wrapper.js that returns a node:stream Readable. Clients
can now .pipe(res) into an HTTP response without buffering the full
sequence, and backpressure is propagated to the Rust worker thread via
a blocking threadsafe-function call.

Also teaches stream_sequence to stream from the in-memory encoded byte
buffer when a Full SequenceRecord is present, so in-memory stores
(which lack a local_path/remote_source) still support streaming.

Adds streaming.test.mjs covering full/substring/error/backpressure
paths, and updates the README with an Express /sequence/:digest
example.
Previously, `open_remote_range` would silently accept any HTTP response
from the remote, including 200 OK with the full file body when the server
ignores the Range header (misbehaving CDNs, proxies, or object stores).
The StreamingDecoder would then wrap the wrong bytes with a
`leading_skip_bits` computed for the requested range, emitting silently
corrupted bases with no error surfaced to the caller.

Fix:
- Reject any response whose status is not 206 Partial Content with a
  clear error message.
- Add connect (30s) and read (60s) timeouts on the remote GET so hung
  remotes cannot stall streaming indefinitely.

Also adds a regression test that spins up a local HTTP server which
deliberately responds 200 OK with the full body, and asserts that
`open_remote_range` now returns an error instead of quietly returning
the wrong bytes.
The streaming decoder is documented as O(1) in sequence length, but two
code paths violated that:

1. Full-record fast path cloned the whole encoded byte range before
   streaming (readonly.rs). For an in-memory chromosome-sized record,
   peak memory doubled on every stream_sequence call.

2. The napi binding pre-called store.load_sequence before streaming,
   pulling the entire sequence into RAM even when the record was backed
   by a file or a remote HTTP range. For a remote-backed Node server
   streaming a chromosome, peak memory grew to ~2x the encoded sequence,
   completely defeating the streaming design.

Fixes:

- SequenceRecord::Full now holds Arc<Vec<u8>> instead of Vec<u8>, and
  stream_sequence hands out an ArcSliceReader that shares ownership of
  the underlying buffer. No byte range is copied. Added sequence_arc()
  accessor; sequence() still returns &[u8] via Deref.
- gtars-node stream_sequence_internal drops the load_sequence call and
  relies on ReadonlyRefgetStore::stream_sequence's existing file/remote
  fallback for Stub records.

Tests (test-driven):

- test_stream_sequence_bounded_memory_full_record uses peak_alloc to
  measure peak allocation delta around a 1 MB stream. Bounded at
  SEQ_LEN/4; the pre-fix code clones the full buffer (measured
  1.26 MB delta), the post-fix code allocates only reader frames.
- test_stream_sequence_no_preload_for_stub_record builds an on-disk
  store, forces the record to Stub, and asserts stream_sequence
  succeeds without populating the in-memory cache. Guards the napi
  fix structurally.
Proves the production path (store opened via open_local/open_remote,
records left as Stub, stream_sequence falling back to the on-disk file)
has O(1) memory use regardless of sequence size. Measured peak is ~8 KiB
(BufReader default) at both 1 MB and 16 MB sequence sizes.

The test drains into a fixed-size scratch buffer instead of read_to_end
to avoid the geometric Vec growth's transient realloc peaks masking the
actual streaming footprint.
Mammalian genomes >4 Gb (e.g. some plant assemblies) would silently
truncate when callers passed u32 bounds. The internal arithmetic was
already u64; this widens the public signature to match.

Adds test_stream_sequence_u64_bounds_compile_and_match to lock the
new signature.
Step 3 (#3): stream_sequence_internal, getSubstring, and
SequenceMetadata.length now use napi BigInt (napi6 feature). A
bigint_to_u64 helper validates at the boundary.

Step 4 (#5): worker thread now checks an AtomicBool abort flag each
iteration. AbortOnDrop guards are captured in each threadsafe function
closure so that stream.destroy() on the JS side flips the flag when
the tsfn is finalized. tsfn_chunk.call() non-Ok status also breaks
the loop, avoiding deadlock if the queue is closing.
Replace the monkey-patch pattern in wrapper.js with a proper
RefgetStore subclass. Static factories (openLocal, openRemote,
inMemory) rewrap via Object.setPrototypeOf. getSubstring coerces
number -> BigInt at the JS boundary. wrapper.d.ts now uses
export * from './index.js' instead of duplicating every type.
BufReader(64 KiB) now wraps the HTTP response in open_remote_range
so ureq doesn't issue one syscall per byte on the remote path.

StreamingDecoder::refill reads up to 6 bytes at a time when the
bit buffer has room, instead of always reading 1 byte. A
TrickleReader test confirms partial-read correctness.
Three new streaming tests:
- BigInt bounds (1n, 7n) produce same output as getSubstring
- stream.destroy() mid-stream doesn't hang the process
- factory instances pass instanceof RefgetStore check
Adds streaming sequence API (streamSequence), BigInt bounds,
thread-safe abort on stream.destroy(), subclass wrapper, and
buffered remote reads.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant