Skip to content

Conversation

@Jerry-Jinfeng-Guo
Copy link
Member

@Jerry-Jinfeng-Guo Jerry-Jinfeng-Guo commented Nov 26, 2025

Closes #338

In this PR:

  • Logic to handle optional_extra
  • Documentation
  • Test

@Jerry-Jinfeng-Guo Jerry-Jinfeng-Guo self-assigned this Nov 26, 2025
@Jerry-Jinfeng-Guo Jerry-Jinfeng-Guo added the feature New feature or request label Nov 26, 2025
@Jerry-Jinfeng-Guo Jerry-Jinfeng-Guo changed the title logic to handle optional_extra Logic to handle optional_extra in Vision Excel converter input Nov 26, 2025
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR implements logic to handle optional extra columns in the tabular converter, addressing issue #338. The feature allows specifying columns that should be included in extra_info if present but won't cause conversion failure if missing.

Key Changes:

  • Added allow_missing parameter throughout the column definition parsing chain to support optional columns
  • Implemented optional_extra wrapper in column definitions to mark columns as optional
  • Enhanced error handling to gracefully skip missing optional columns while preserving required column behavior

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

File Description
src/power_grid_model_io/converters/tabular_converter.py Core implementation of optional_extra logic with allow_missing parameter propagation and empty DataFrame handling for missing columns
tests/unit/converters/test_tabular_converter.py Comprehensive test coverage for optional_extra feature including edge cases and integration tests
docs/converters/vision_converter.md Documentation explaining optional_extra syntax, behavior, and use cases for Vision Excel exports

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 10 out of 13 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +68 to +71
extra:
- ID # Required - fails if missing
- Name # Required - fails if missing
- optional_extra:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what happens if an element is specified in both the required and optional?

      extra:
        - ID            # Required - fails if missing
        - optional_extra:
          - ID          # Optional - skipped if missing

I believe that the default should be that required precedes optional, so maybe we need to add an explicit test case for this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what was the behaviour previously when specifying

extra:
  - ID
  - ID

? Did it append a separate column or did it combine the two?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only one would appear. In this case with the second ID it won't do anything since it already exists.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in that case, it may be possible to achieve the same result (namely that the required column is always leading) with less new code. Note that that does mean that the optional_extra needs to be parsed after the regular extra things to ensure this works:

extra:
  - optional_extra:
    - ID
  - ID

in addition, the following two should also be equivalent:

extra:
  - optional_extra:
    - ID
  - optional_extra:
    - GUID
extra:
  - optional_extra:
    - ID
    - GUID

Comment on lines 399 to 408
allow_missing: bool = False,
) -> pd.DataFrame:
"""Interpret the column definition and extract/convert/create the data as a pandas DataFrame.
Args:
data: TabularData:
table: str:
col_def: Any:
extra_info: Optional[ExtraInfo]:
allow_missing: bool: If True, missing columns will return empty DataFrame instead of raising KeyError
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder whether this is the right solution. If I understand correctly, this parameter is needed due to the recursion, right? Would there be a world in which either no recursion is needed, or in which we can do without this allow_missing? It feels bugprone

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that is not what i meant. the other comment is about how we specify the allow_missing argument. This comment is about why we need it in the first place.

Copy link
Member

@mgovers mgovers Dec 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if i understand correctly, the only reason we need allow_missing is so that we can sanitize IF there are optional attributes in the result... That's a separate code path.

However, why can't we just gather all attributes and return empty dataframes by default, and then sanitize that a required attribute can't be an empty dataframe (unless of course there are no items in the first place). That way, we do not introduce a separate code path that is only triggered if there are optional attributes. That simplifies the logic a lot.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd argue the other way, by adding a simple branch, which is commonly seen everywhere in this project, we avoid the specialized sanitization step which could, equally likely, introduce issue.

Comment on lines 399 to 408
allow_missing: bool = False,
) -> pd.DataFrame:
"""Interpret the column definition and extract/convert/create the data as a pandas DataFrame.
Args:
data: TabularData:
table: str:
col_def: Any:
extra_info: Optional[ExtraInfo]:
allow_missing: bool: If True, missing columns will return empty DataFrame instead of raising KeyError
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add allow_missing to the tests (either by testing that it works as intended + that all sub-calls to the mocks are made correctly, and/or by raising an error if it is set but recursion level is not 0)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines +426 to +428
return self._parse_col_def_composite(
data=data, table=table, col_def=optional_cols, table_mask=table_mask, allow_missing=True
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the only place where I see allow_missing=True. Why is that? Is it because it's only relevant when the underlying structure is a dict, hence then is only when the "new" extra optional parameters matter? Why isn't it relevant bellow when we have a list instead?

I don't think I've ever worked in this repo, so genuinely asking. I'd appreciate tech review over this :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now only optional_extra are fields we attribute the 'optional' to its existence.

Jerry-Jinfeng-Guo and others added 3 commits December 1, 2025 21:09
Comments from Martijn

Co-authored-by: Martijn Govers <[email protected]>
Signed-off-by: Jerry Guo <[email protected]>
Comments from Martijn

Co-authored-by: Martijn Govers <[email protected]>
Signed-off-by: Jerry Guo <[email protected]>
return pd.DataFrame(index=table_data.index)
columns_str = " and ".join(f"'{col_name}'" for col_name in columns)
raise KeyError(f"Could not find column {columns_str} on table '{table}'")
raise KeyError(f"Could not find column {columns_str} on table '{table}'") from e
Copy link
Member

@mgovers mgovers Dec 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did some thinking on the original code here. It now understand why it did not have raise from e here:

  • either it is a string that can be converted to a float (e.g. inf; see the description in the try: statement), which should be interpreted as a float
  • or is a column name which is absent which
    • either is allowed (new code)
    • or it is disallowed and should raise a KeyError

However, I dislike that original design choice. Instead, I am of the opinion that it should be something like

if const_value := float_or_none(col_def):
    return self._parse_col_def_const(..., const_value=const_value)

# now, we know for sure that it is a column name that is absent
if allow_missing:
    return ...

raise KeyError(...)

where float_or_none can indeed be implemented as

def float_or_none(value: str) -> float | None:
    try:
        return float(value)
    except ValueError:
        return None

The reason I do not want to raise in the except block is that the conversion to float is actually already the fall-back. In pseudo-code, the logic is actually something like

if the column header is a regular column name:
    return column header as regular column name
elif the column header is a floating point value:
    return the column header as a floating point value
elif this specific column is optional
    return None
else:
    raise a KeyError because it was mandatory but it is not found

this also is an indication as to why the raise ... from e was disabled: the key-error is not the fall-back but the default - the conversion to float is actually the fall-back logic. But, again, there's a reason why raise ... from e is good practice: it shows the intention. In this original code, the intention was not clear.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your suggestion?

Copy link
Member

@mgovers mgovers Dec 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i know. raise ... from e is good practice always. The only reason why it's not here is because the actual design is bad, not because the raise ... from e is not supposed to be used here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'll add this to the knowledge sharing session of tomorrow.

@sonarqubecloud
Copy link

sonarqubecloud bot commented Dec 3, 2025

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

feature New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[FEATURE] Graceful Handling of Missing Extra Columns in Vision Excel Files

5 participants