-
Notifications
You must be signed in to change notification settings - Fork 12
Logic to handle optional_extra in Vision Excel converter input #344
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Jerry Guo <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR implements logic to handle optional extra columns in the tabular converter, addressing issue #338. The feature allows specifying columns that should be included in extra_info if present but won't cause conversion failure if missing.
Key Changes:
- Added
allow_missingparameter throughout the column definition parsing chain to support optional columns - Implemented
optional_extrawrapper in column definitions to mark columns as optional - Enhanced error handling to gracefully skip missing optional columns while preserving required column behavior
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| src/power_grid_model_io/converters/tabular_converter.py | Core implementation of optional_extra logic with allow_missing parameter propagation and empty DataFrame handling for missing columns |
| tests/unit/converters/test_tabular_converter.py | Comprehensive test coverage for optional_extra feature including edge cases and integration tests |
| docs/converters/vision_converter.md | Documentation explaining optional_extra syntax, behavior, and use cases for Vision Excel exports |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Signed-off-by: Jerry Guo <[email protected]>
Copilot Co-authored-by: Copilot <[email protected]> Signed-off-by: Jerry Guo <[email protected]>
Signed-off-by: Jerry Guo <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 10 out of 13 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Signed-off-by: Jerry Guo <[email protected]>
Copilot Co-authored-by: Copilot <[email protected]> Signed-off-by: Jerry Guo <[email protected]>
Copilot Co-authored-by: Copilot <[email protected]> Signed-off-by: Jerry Guo <[email protected]>
Signed-off-by: Jerry Guo <[email protected]>
| extra: | ||
| - ID # Required - fails if missing | ||
| - Name # Required - fails if missing | ||
| - optional_extra: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what happens if an element is specified in both the required and optional?
extra:
- ID # Required - fails if missing
- optional_extra:
- ID # Optional - skipped if missingI believe that the default should be that required precedes optional, so maybe we need to add an explicit test case for this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what was the behaviour previously when specifying
extra:
- ID
- ID? Did it append a separate column or did it combine the two?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only one would appear. In this case with the second ID it won't do anything since it already exists.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in that case, it may be possible to achieve the same result (namely that the required column is always leading) with less new code. Note that that does mean that the optional_extra needs to be parsed after the regular extra things to ensure this works:
extra:
- optional_extra:
- ID
- IDin addition, the following two should also be equivalent:
extra:
- optional_extra:
- ID
- optional_extra:
- GUIDextra:
- optional_extra:
- ID
- GUID| allow_missing: bool = False, | ||
| ) -> pd.DataFrame: | ||
| """Interpret the column definition and extract/convert/create the data as a pandas DataFrame. | ||
| Args: | ||
| data: TabularData: | ||
| table: str: | ||
| col_def: Any: | ||
| extra_info: Optional[ExtraInfo]: | ||
| allow_missing: bool: If True, missing columns will return empty DataFrame instead of raising KeyError |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder whether this is the right solution. If I understand correctly, this parameter is needed due to the recursion, right? Would there be a world in which either no recursion is needed, or in which we can do without this allow_missing? It feels bugprone
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that is not what i meant. the other comment is about how we specify the allow_missing argument. This comment is about why we need it in the first place.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if i understand correctly, the only reason we need allow_missing is so that we can sanitize IF there are optional attributes in the result... That's a separate code path.
However, why can't we just gather all attributes and return empty dataframes by default, and then sanitize that a required attribute can't be an empty dataframe (unless of course there are no items in the first place). That way, we do not introduce a separate code path that is only triggered if there are optional attributes. That simplifies the logic a lot.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd argue the other way, by adding a simple branch, which is commonly seen everywhere in this project, we avoid the specialized sanitization step which could, equally likely, introduce issue.
| allow_missing: bool = False, | ||
| ) -> pd.DataFrame: | ||
| """Interpret the column definition and extract/convert/create the data as a pandas DataFrame. | ||
| Args: | ||
| data: TabularData: | ||
| table: str: | ||
| col_def: Any: | ||
| extra_info: Optional[ExtraInfo]: | ||
| allow_missing: bool: If True, missing columns will return empty DataFrame instead of raising KeyError |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please add allow_missing to the tests (either by testing that it works as intended + that all sub-calls to the mocks are made correctly, and/or by raising an error if it is set but recursion level is not 0)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| return self._parse_col_def_composite( | ||
| data=data, table=table, col_def=optional_cols, table_mask=table_mask, allow_missing=True | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the only place where I see allow_missing=True. Why is that? Is it because it's only relevant when the underlying structure is a dict, hence then is only when the "new" extra optional parameters matter? Why isn't it relevant bellow when we have a list instead?
I don't think I've ever worked in this repo, so genuinely asking. I'd appreciate tech review over this :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For now only optional_extra are fields we attribute the 'optional' to its existence.
Comments from Martijn Co-authored-by: Martijn Govers <[email protected]> Signed-off-by: Jerry Guo <[email protected]>
Comments from Martijn Co-authored-by: Martijn Govers <[email protected]> Signed-off-by: Jerry Guo <[email protected]>
Signed-off-by: Jerry Guo <[email protected]>
| return pd.DataFrame(index=table_data.index) | ||
| columns_str = " and ".join(f"'{col_name}'" for col_name in columns) | ||
| raise KeyError(f"Could not find column {columns_str} on table '{table}'") | ||
| raise KeyError(f"Could not find column {columns_str} on table '{table}'") from e |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did some thinking on the original code here. It now understand why it did not have raise from e here:
- either it is a string that can be converted to a float (e.g.
inf; see the description in thetry:statement), which should be interpreted as a float - or is a column name which is absent which
- either is allowed (new code)
- or it is disallowed and should raise a
KeyError
However, I dislike that original design choice. Instead, I am of the opinion that it should be something like
if const_value := float_or_none(col_def):
return self._parse_col_def_const(..., const_value=const_value)
# now, we know for sure that it is a column name that is absent
if allow_missing:
return ...
raise KeyError(...)where float_or_none can indeed be implemented as
def float_or_none(value: str) -> float | None:
try:
return float(value)
except ValueError:
return NoneThe reason I do not want to raise in the except block is that the conversion to float is actually already the fall-back. In pseudo-code, the logic is actually something like
if the column header is a regular column name:
return column header as regular column name
elif the column header is a floating point value:
return the column header as a floating point value
elif this specific column is optional
return None
else:
raise a KeyError because it was mandatory but it is not foundthis also is an indication as to why the raise ... from e was disabled: the key-error is not the fall-back but the default - the conversion to float is actually the fall-back logic. But, again, there's a reason why raise ... from e is good practice: it shows the intention. In this original code, the intention was not clear.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your suggestion?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i know. raise ... from e is good practice always. The only reason why it's not here is because the actual design is bad, not because the raise ... from e is not supposed to be used here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i'll add this to the knowledge sharing session of tomorrow.
Signed-off-by: Jerry Guo <[email protected]>
Signed-off-by: Jerry Guo <[email protected]>
|



Closes #338
In this PR:
optional_extra