Skip to content

Changes to .janno and .ssf columns for Poseidon v3.0.0#357

Open
nevrome wants to merge 51 commits intoSchema_300_devfrom
poseidon300cols
Open

Changes to .janno and .ssf columns for Poseidon v3.0.0#357
nevrome wants to merge 51 commits intoSchema_300_devfrom
poseidon300cols

Conversation

@nevrome
Copy link
Member

@nevrome nevrome commented Aug 28, 2025

This PR is part of the preparation for Poseidon v3.0.0 as specified here: poseidon-framework/poseidon-schema#93

As documented in #351 I implemented the changes to .janno and .ssf.

While doing so I noticed two minor things I wanted to change in the new schema specification: poseidon-framework/poseidon-schema@31900ef & poseidon-framework/poseidon-schema@a256a62

Adding and modifying the columns in poseidon-hs also required (or: made desirable) some minor changes in the surrounding code -- but nothing spectacular.

@nevrome nevrome requested a review from stschiff August 28, 2025 13:06
@codecov
Copy link

codecov bot commented Aug 28, 2025

Codecov Report

❌ Patch coverage is 52.31144% with 196 lines in your changes missing coverage. Please review.
✅ Project coverage is 55.70%. Comparing base (fec6b1f) to head (3b7fc1b).
⚠️ Report is 1 commits behind head on Schema_300_dev.

Files with missing lines Patch % Lines
src/Poseidon/ColumnTypesJanno.hs 43.93% 31 Missing and 43 partials ⚠️
src/Poseidon/Janno.hs 60.92% 3 Missing and 56 partials ⚠️
src/Poseidon/SequencingSource.hs 45.16% 7 Missing and 27 partials ⚠️
src/Poseidon/ColumnTypesSSF.hs 37.14% 9 Missing and 13 partials ⚠️
src/Poseidon/ColumnTypesUtils.hs 78.57% 0 Missing and 3 partials ⚠️
src/Poseidon/PoseidonVersion.hs 33.33% 0 Missing and 2 partials ⚠️
src/Poseidon/CLI/Validate.hs 50.00% 0 Missing and 1 partial ⚠️
src/Poseidon/Package.hs 87.50% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@                Coverage Diff                 @@
##           Schema_300_dev     #357      +/-   ##
==================================================
- Coverage           56.89%   55.70%   -1.19%     
==================================================
  Files                  33       33              
  Lines                5043     5235     +192     
  Branches              548      631      +83     
==================================================
+ Hits                 2869     2916      +47     
- Misses               1626     1688      +62     
- Partials              548      631      +83     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@stschiff
Copy link
Member

stschiff commented Sep 5, 2025

Thanks, I will take a look.

@nevrome
Copy link
Member Author

nevrome commented Dec 8, 2025

This PR is pretty straight forward, actually. It just adds new columns and some special handling for the ones whose semantics change with Poseidon v3.0.0. This affects so many files because changes in the .janno columns change a lot of reference data for the golden tests.

The only interesting bit is the way _Note columns are sorted when writing .janno files.

There are still two open PRs in the schema repository that should be worked into this PR, as soon as they are merged:

When this is done, the review can proceed.

@nevrome nevrome changed the base branch from master to Schema_300_dev December 15, 2025 15:44
@nevrome nevrome marked this pull request as draft December 15, 2025 15:44
@nevrome nevrome marked this pull request as ready for review January 18, 2026 13:27
@nevrome
Copy link
Member Author

nevrome commented Jan 18, 2026

OK - I think this is ready for review. It seems overwhelming with all these files affected, but the actual changes are realtively minimal.

I suggest you start in src/Poseidon/ColumnTypesUtils.hs. That's where the new FromFieldVersioned lives that conceptually powers a lot of the new versioned .janno/.ssf parsing machinery. The consequences of this change are well visible then in src/Poseidon/ColumnTypesJanno.hs and beyond, mixed with the few individual column-wise changes of Poseidon v3.0.0.

Copy link
Member

@stschiff stschiff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, great. This is a very impressive piece of work. I've left some comments. May main question is about the JannoCoalescce and also validate, where I am confused that you force the input files to always be read assuming the latest version... but shouldn't we enable the user to say which version the file is in, or even be able to read it from the YAML if it's submitted as a package?

runJannocoalesce (JannoCoalesceOptions sourceSpec target outSpec fields overwrite sKey tKey maybeStrip) = do
JannoRows sourceRows <- case sourceSpec of
JannoSourceSingle sourceFile -> readJannoFile [] sourceFile
JannoSourceSingle sourceFile -> readJannoFile latestPoseidonVersion [] sourceFile
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait, this I don't understand. Why would we want to read Janno Files with anything else than the particular version that this Janno File was stored in? Why force them all to be read with the latest only?

runValidate (ValidateOptions (ValPlanJanno path) mandatoryJannoCols _ noExitCode _) = do
logInfo $ "Validating: " ++ path
(JannoRows entries) <- readJannoFile mandatoryJannoCols path
(JannoRows entries) <- readJannoFile latestPoseidonVersion mandatoryJannoCols path
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here (as in JannoCoalesce). Why don't we want to read that according to the exact Poseidon Version of the package this file is in?

runValidate (ValidateOptions (ValPlanSSF path) _ mandatorySSFCols noExitCode _) = do
logInfo $ "Validating: " ++ path
(SeqSourceRows entries) <- readSeqSourceFile mandatorySSFCols path
(SeqSourceRows entries) <- readSeqSourceFile latestPoseidonVersion mandatorySSFCols path
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See above.

Co-authored-by: Stephan Schiffels <stephan_schiffels@mac.com>
@nevrome
Copy link
Member Author

nevrome commented Feb 10, 2026

Very good observation about jannocoalesce and validate. At the time I thought that adding a new command line option to set the version in these cases was overkill. But you're right, of course. The user should have the possibility to set this.

For jannocoalesce we'll need two optional arguments then, both for input and output.

@stschiff
Copy link
Member

Very good observation about jannocoalesce and validate. At the time I thought that adding a new command line option to set the version in these cases was overkill. But you're right, of course. The user should have the possibility to set this.

For jannocoalesce we'll need two optional arguments then, both for input and output.

I think for the output we would always just offer the latest version. I think that's simpler, and we don't want to actually keep the writer-function flexible, just the parser alone will be tricky enough to maintain, right?

@stschiff
Copy link
Member

For jannocoalesce we read in base directories, so we can just read the input version from the package, right? No need to provide a command line argument. For validate for the option of reading in base directories it can also just be read. I think only with the trident validate --janno is there ambiguity which input version it is. So I think I would restrict the input option only to the --janno input and for all other cases just read the version from the YAMl, right?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants