Skip to content

Make extracted raw data location configurable, with default to work#360

Closed
zmaalick wants to merge 1 commit intomainfrom
156_row_data_location_default_to_work
Closed

Make extracted raw data location configurable, with default to work#360
zmaalick wants to merge 1 commit intomainfrom
156_row_data_location_default_to_work

Conversation

@zmaalick
Copy link
Collaborator

@zmaalick zmaalick commented Feb 9, 2026

Closes #156 .

PR creation checklist for the developer

  • Has <issue_number> above ☝️ been replaced with the issue number?
  • Has main been selected as the base branch?
  • Does the feature branch name follow the format <issue_number>_<short_description_of_feature>?
  • Does the text of the PR title exactly match with the text (not including the issue number) of the issue title?
  • Have appropriate reviewers been added to the PR (once it is ready for review)?
  • Has the PR been assigned to the developer(s)?
  • Have the same labels as on the issue (except for the good first issue label) been added to the PR?
  • Has the Climate Model Evaluation Workflow (CMEW) project been added to the PR?
  • Has the appropriate milestone been added to the PR?

Definition of Done for the developer

  • Does the change in this PR address the above issue / have all acceptance criteria been met?
  • Does the change in this PR follow the requirements in the wiki: Developer Guide (including copyrights)?
  • Have new tests related to the change been added?
  • Do all the GitHub workflow checks pass?
  • Do all the tests run locally and pass? (Note: the tests are not run by the GitHub workflow, see wiki: Run the tests locally)
  • Has the API documentation (e.g. docstrings in Python modules) related to the change been updated appropriately?
  • Has the user documentation (i.e. everything in the doc directory) related to the change been updated appropriately, including the Quick Start section?
  • Do the HTML pages render correctly? (See wiki: Build the documentation locally)

PR creation checklist for the reviewer

  • Has <issue_number> above ☝️ been replaced with the issue number?
  • Has main been selected as the base branch?
  • Does the feature branch name follow the format <issue_number>_<short_description_of_feature>?
  • Does the text of the PR title exactly match with the text (not including the issue number) of the issue title?
  • Have appropriate reviewers been added to the PR (once it is ready for review)?
  • Has the PR been assigned to the developer(s)?
  • Have the same labels as on the issue (except for the good first issue label) been added to the PR?
  • Has the Climate Model Evaluation Workflow (CMEW) project been added to the PR?
  • Has the appropriate milestone been added to the PR?

Definition of Done for the reviewer

  • Does the change in this PR address the above issue / have all acceptance criteria been met?
  • Does the change in this PR follow the requirements in the wiki: Developer Guide (including copyrights)?
  • Have new tests related to the change been added?
  • Do all the GitHub workflow checks pass?
  • Do all the tests run locally and pass? (Note: the tests are not run by the GitHub workflow, see wiki: Run the tests locally)
  • Has the API documentation (e.g. docstrings in Python modules) related to the change been updated appropriately?
  • Has the user documentation (i.e. everything in the doc directory) related to the change been updated appropriately, including the Quick Start section?
  • Do the HTML pages render correctly? (See wiki: Build the documentation locally)

@zmaalick zmaalick added this to the v0.2.0 (multiple model runs) milestone Feb 9, 2026
@zmaalick zmaalick self-assigned this Feb 9, 2026
@zmaalick zmaalick added enhancement New feature or request standardise Anything related to CDDS technical debt Technical debt in CMEW labels Feb 9, 2026
@NParsonsMO NParsonsMO self-requested a review February 9, 2026 10:08
@NParsonsMO NParsonsMO changed the title Make extracted raw data location configurable, with default to work Make extracted raw data location configurable, with default to work Feb 9, 2026
Copy link
Collaborator

@NParsonsMO NParsonsMO left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is doing what it's expected to.

When I set
RAW_DATA_DIR="<my_scratch_dir>/raw_data_test"
I can see that the directory is created, but I watched it throughout the CMEW run (repeatedly refreshing) and nothing appeared there. When I went back though the logs, I noticed these lines in the log (~/cylc-run/CMEW/run7/share/data/cdds/proc/GCModelDev/ESMVal/u-bv526/round-1/extract/log/cdds_extract_apm_2026-02-09T1202Z.log):

2026-02-09 12:02:23 cdds.extract.common.run_moo_cmd INFO: moo command: '['moo', 'select', '-i', '-d', '<my_home_dir>/cylc-run/CMEW/run7/share/data/cdds/proc/GCModelDev/ESMVal/u-bv526/round-1/extract/apm_1993-01-01T00:00:00Z_1993-12-01T00:00:00Z.dff', 'moose:/crum/u-bv526/apm.pp', '<my_home_dir>/cylc-run/CMEW/run7/share/data/cdds/cdds_data/GCModelDev/ESMVal/HadGEM3-GC31-LL/amip/r5i1p1f3/round-1/input/u-bv526/apm']'

So I think it is still extracting to the shared directory.

MAX_PARALLEL_TASKS=4
MODEL_ID="HadGEM3-GC5E-LL"
NUMBER_OF_YEARS=1
RAW_DATA_DIR=""
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add some info to CMEW/meta/rose-meta.conf about the new variable?
e.g. it shouldn't be compulsory, the user might need help with when to specify it...

MAX_PARALLEL_TASKS=4
MODEL_ID="HadGEM3-GC5E-LL"
NUMBER_OF_YEARS=1
RAW_DATA_DIR=""
Copy link
Collaborator

@NParsonsMO NParsonsMO Feb 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it also worth adding some error checking?

e.g. when I accidentally set this to RAW_DATA_DIR="<my_scratch>/raw_data_test/" but failed to copy the initial "/", it created ~/cylc-run/CMEW/run5/work/1/standardise_model_data/<PATH_ABOVE>/proc.

Maybe we don't care.

@zmaalick zmaalick closed this Feb 16, 2026
@zmaalick zmaalick deleted the 156_row_data_location_default_to_work branch February 16, 2026 08:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request standardise Anything related to CDDS technical debt Technical debt in CMEW

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Make extracted raw data location configurable, with default to work

2 participants