Skip to content

Fixes for upstream#1

Open
Johann3141592 wants to merge 5 commits into
clg-admin:mainfrom
Johann3141592:fixes-for-upstream
Open

Fixes for upstream#1
Johann3141592 wants to merge 5 commits into
clg-admin:mainfrom
Johann3141592:fixes-for-upstream

Conversation

@Johann3141592

Copy link
Copy Markdown

Description

Five bugfixes discovered while running the RDM + PRIM pipeline on a simple test model. Its possible some fixes are not necessary but rather stem from me misunderstanding how the codebase is expected to be run (e.g. is it convention to always call models "Scenario1" or is it a genuine bug that the code has some hardcoded references to that name (same with regions)?). The fixes where made with the help of Opencode & Deepseek v4 pro.

Motivation

I needed to make the code work with the simple model since thats my starting point for a Dissertation project (extending rdm with a gsa).

Changes

  • Fix pyDOE import (case sensitivity on some filesystems)
  • Fix hardcoded region name in preprocessing -> now config-driven
  • Replace hardcoded scenario/period keys with dynamic lookup
  • Fix YEAR dtype in parquet output and auto-detect CSV delimiter
  • Deduplicate overlapping driver/outcome columns; drive PRIM outcome
    directions from YAML config instead of hardcoded dicts

Note on PRIM_t3f2.yaml

The YAML includes model's specific outcome names for the simple model as illustration of
the new outcome_directions schema. Upstream should substitute their own. Its also a choice I made to push the definition of risky/beneficial to the yaml, doing it in the excel setup might be even more appropriate. Its just the model specific hard code of them which caused caused errors so I needed to fix it.

Testing

The full pipeline went through fine with my simple model and my respective rdm and prim setup files. Tried to run the pipeline with the current upstream model (Botswana?) and config files, rdm went fine but prim caused an error stemming from a line which I didnt edit. I didnt investigate this further, can also be because I only included one model run since my pc is not very capable.

Style

Did not run Black because it changed every file in the upstream codebase and
reformatting would thus obscure the actual bugfixes in the diff.

Replace hardcoded 'Scenario1' and period_list with dynamic key discovery from the actual data, and deduplicate DataFrame columns to prevent Series-vs-scalar errors in scaling functions.
- z_auxiliar_code: use pd.api.types.is_string_dtype() to detect
  string columns in parquet writer, ensuring YEAR stays int64
- t3f2_prim_files_creator: auto-detect CSV delimiter (; or ,)
  to handle both separator formats in experiment data files
- t3f3_prim_manager: skip driver columns that overlap with outcome
  columns to prevent duplicate key errors in dict_large_table
- t3f4_range_finder_mapping: replace hardcoded desirable/risk
  outcome dictionaries with YAML-driven config
- PRIM_t3f2.yaml: add outcome_directions mapping section
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant