Skip to content

rf(schema): Make participants/sessions.tsv content available, add glob() and zip() functions#2359

Draft
effigies wants to merge 6 commits intobids-standard:masterfrom
effigies:rf/tables
Draft

rf(schema): Make participants/sessions.tsv content available, add glob() and zip() functions#2359
effigies wants to merge 6 commits intobids-standard:masterfrom
effigies:rf/tables

Conversation

@effigies
Copy link
Copy Markdown
Collaborator

@effigies effigies commented Mar 6, 2026

BEP36 proposes more extensive checks comparing the contents of /participants.tsv, /sessions.tsv and /phenotype/*.tsv to one another and other session directories. This PR introduces schema changes to facilitate this.

We require access to at least sessions.tsv contents, and we (@rwblair and I) propose dataset.sessions_tsv as an object mapping column headers onto column contents.

For consistency, and because the dataset.subjects context object is an awkward way of accessing the various lists of subject IDs, we propose to make dataset.participants_tsv along the same lines, with dataset.participants_tsv.participant_id rendering obsolete dataset.subjects.participant_id.

We also need to collate rows across columns. The natural operation in Python is zip(), so we propose that here. It is not used in any checks, yet.

With dataset.subjects.sub_dirs being the only remaining content of dataset.subjects, a more general way of achieving the same goal was desired. A glob() function seems to fit the bill, which can be used to create lists of files in selectors/checks. This PR demonstrates the use of glob() and dataset.participants_tsv to obsolete dataset.subjects altogether.

Would be interested in people's thoughts on these changes. Open to any alternatives.

Additional thoughts on refactoring the validation context

I think the validation context is due a bit of a rethink after a couple years, and getting rid of dataset.subjects I think is a good idea regardless of what happens to BEP36, but I'm not married to this approach. I think we should aim for consistency and get rid of subject.sessions (and maybe subject), but that hasn't been thought through, yet.

Other things in the context that are as-yet unused:

  • dataset.tree - The exists() and here-introduced glob() functions do what we might imagine this object should do. Unclear how it could be used without functions or at least comprehensions and predicates.
  • dataset.ignored - There may be checks to write using this, but until we try, it's unclear if this is the right interface.

Things that might be reconsidered:

  • Both the typescript and the nascent python context implementations have a file object that represent the file. Perhaps collating the file path, and similar general filesystem-level attributes should into file.path and so-on would make sense.
    • If so, would it be a good idea to move other fields underneath file? file.nifti_header or file.columns make sense. That said, I'm inclined to leave file to just contain the fields that do not need more than a stat to establish. The principle could be dataset is populated at validation start, file is populated when reading the filesystem metadata, nifti_header and so on require additional opens or computations to populate.

cc @ericearl @surchs

@ericearl
Copy link
Copy Markdown
Collaborator

I am always impressed by @effigies's and @rwblair's attention to detail. I read through your comment above and all of the edits, and while I may be biased I think the edits all look great and should enable the things we want to do! Nice work!

@rwblair
Copy link
Copy Markdown
Member

rwblair commented Mar 18, 2026

Related validator PR bids-standard/bids-validator#366

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants