Problem
create_files (client-side, src/client/workflow_spec.rs) loops over every FileSpec that carries a user-supplied identifier and makes one POST per file via create_input_file_entity_with_identifier → apis::ro_crate_entities_api::create_ro_crate_entity.
For a workflow with N identifier-bearing input files this is N sequential round-trips with no batching:
// src/client/workflow_spec.rs (excerpt)
for (file_spec, file_model) in files.iter().zip(file_models.iter()) {
let Some(identifier) = file_spec.identifier.as_deref() else { continue; };
...
crate::client::ro_crate_utils::create_input_file_entity_with_identifier(
config, workflow_id, &file_with_id, identifier,
)?;
}
The single-file framing in the docs (one DOI on one reference genome) is fine, but parameterized identifier templates expanding to hundreds of files are an obvious use case and would be noticeably slow. Partial failures mid-loop also leave orphan rows that only get cleaned up via the workflow's ON DELETE CASCADE if/when the higher-level rollback path fires.
Why now
This shipped behind feat/ro-crate-user-supplied-file-identifiers. Today it works correctly for the docs' framing example, but the same code path will be the choke point as soon as anyone writes:
files:
- name: input_{i}
path: data/input_{i}.csv
identifier: \"urn:dataset:run-2026:{i}\"
parameters:
i: \"1:1000\"
Better to add the bulk endpoint before users hit it than after.
Proposed solution
Mirror the existing Files API pattern.
Server (src/server/api/ro_crate.rs):
- New trait method
create_ro_crate_entities(body: RoCrateEntitiesModel, context) returning the inserted rows.
- Implementation: single transaction, one parameterized
INSERT per row (or INSERT ... VALUES (...), (...), ... if it fits within SQLite's parameter limit; otherwise chunked).
- Same authorization as
create_ro_crate_entity (workflow_id-scoped via authorize_workflow!).
OpenAPI / generated client:
- Refresh via
cd api && bash sync_openapi.sh all --promote and regenerate clients per CLAUDE.md's add-endpoint checklist.
Client (src/client/workflow_spec.rs + src/client/ro_crate_utils.rs):
- Replace the per-file loop in
create_files with a single bulk call. Collect the identifier-bearing (file_with_id, identifier) pairs into a Vec<RoCrateEntityModel> and POST once.
- Keep
create_input_file_entity_with_identifier for callers that legitimately add one row at a time, or remove it if the loop was its only caller.
Tests:
- Server integration test that a bulk POST inserts all rows in one transaction and rolls back atomically on any per-row violation (e.g. duplicate
(workflow_id, entity_id)).
- Client test exercising a parameterized
identifier template that expands to ≥10 files and verifies a single POST is made.
Out of scope (separate follow-ups)
- A general bulk-update / bulk-upsert endpoint for ro_crate entities (the init-time entity rebuild in
create_ro_crate_entity_for_file has its own N+1 pattern of find → update).
- Server-side batching of the init-time
create_entities_for_input_files path — uses single creates today but isn't on the latency-sensitive workflow-creation hot path.
References
- Surfaced during
/review-api on PR for feat/ro-crate-user-supplied-file-identifiers.
- Precedent:
Files API already exposes create_files (bulk) alongside create_file (single) — src/server/api/files.rs:33.
Problem
create_files(client-side,src/client/workflow_spec.rs) loops over everyFileSpecthat carries a user-suppliedidentifierand makes one POST per file viacreate_input_file_entity_with_identifier→apis::ro_crate_entities_api::create_ro_crate_entity.For a workflow with N identifier-bearing input files this is N sequential round-trips with no batching:
The single-file framing in the docs (one DOI on one reference genome) is fine, but parameterized
identifiertemplates expanding to hundreds of files are an obvious use case and would be noticeably slow. Partial failures mid-loop also leave orphan rows that only get cleaned up via the workflow'sON DELETE CASCADEif/when the higher-level rollback path fires.Why now
This shipped behind
feat/ro-crate-user-supplied-file-identifiers. Today it works correctly for the docs' framing example, but the same code path will be the choke point as soon as anyone writes:Better to add the bulk endpoint before users hit it than after.
Proposed solution
Mirror the existing
FilesAPI pattern.Server (
src/server/api/ro_crate.rs):create_ro_crate_entities(body: RoCrateEntitiesModel, context)returning the inserted rows.INSERTper row (orINSERT ... VALUES (...), (...), ...if it fits within SQLite's parameter limit; otherwise chunked).create_ro_crate_entity(workflow_id-scoped viaauthorize_workflow!).OpenAPI / generated client:
cd api && bash sync_openapi.sh all --promoteand regenerate clients per CLAUDE.md's add-endpoint checklist.Client (
src/client/workflow_spec.rs+src/client/ro_crate_utils.rs):create_fileswith a single bulk call. Collect the identifier-bearing(file_with_id, identifier)pairs into aVec<RoCrateEntityModel>and POST once.create_input_file_entity_with_identifierfor callers that legitimately add one row at a time, or remove it if the loop was its only caller.Tests:
(workflow_id, entity_id)).identifiertemplate that expands to ≥10 files and verifies a single POST is made.Out of scope (separate follow-ups)
create_ro_crate_entity_for_filehas its own N+1 pattern offind → update).create_entities_for_input_filespath — uses single creates today but isn't on the latency-sensitive workflow-creation hot path.References
/review-apion PR forfeat/ro-crate-user-supplied-file-identifiers.FilesAPI already exposescreate_files(bulk) alongsidecreate_file(single) —src/server/api/files.rs:33.