-
-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Support create_index: bool in to_dataframe() to skip creating MultiIndex
#10979
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This pull request adds a create_index parameter to both Dataset.to_dataframe() and DataArray.to_dataframe() methods, allowing users to bypass the potentially expensive MultiIndex creation and use a simple RangeIndex instead. This is a performance optimization feature that addresses issue #10912.
- Adds optional
create_index: bool = Trueparameter to maintain backward compatibility - When
create_index=False, usespd.RangeIndexinstead of constructing a MultiIndex from coordinates - Preserves data integrity and ordering while avoiding the MultiIndex overhead
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| xarray/core/dataset.py | Adds create_index parameter to to_dataframe() and _to_dataframe() methods, implementing RangeIndex creation when parameter is False |
| xarray/core/dataarray.py | Adds create_index parameter to to_dataframe() method and passes it through to Dataset's _to_dataframe() |
| xarray/tests/test_dataset.py | Adds comprehensive tests for the new create_index parameter on Dataset, including default behavior, RangeIndex creation, data integrity, and interaction with dim_order |
| xarray/tests/test_dataarray.py | Adds comprehensive tests for the new create_index parameter on DataArray, including default behavior, RangeIndex creation, data integrity, and interaction with additional coordinates |
Comments suppressed due to low confidence (1)
xarray/core/dataset.py:7277
- The docstring states "The DataFrame is indexed by the Cartesian product of this dataset's indices" but this is no longer true when
create_index=False. Consider updating the first paragraph of the docstring to clarify this behavior, e.g., "When create_index=True (default), the DataFrame is indexed by the Cartesian product of this dataset's indices. When create_index=False, a simple RangeIndex is used instead."
Non-index variables in this dataset form the columns of the
DataFrame. The DataFrame is indexed by the Cartesian product of
this dataset's indices.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
dcherian
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great, can you add a note to whats-new.rst please
create_index: bool in to_dataframe() to skip creating MultiIndex
benbovy
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @dhruvak001. I just left one small comment on the docstrings.
| create_index : bool, default: True | ||
| If True (default), create a MultiIndex from the Cartesian product | ||
| of this dataset's indices. If False, use a RangeIndex instead. | ||
| This can be useful to avoid the potentially expensive MultiIndex | ||
| creation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| create_index : bool, default: True | |
| If True (default), create a MultiIndex from the Cartesian product | |
| of this dataset's indices. If False, use a RangeIndex instead. | |
| This can be useful to avoid the potentially expensive MultiIndex | |
| creation. | |
| create_index : bool, default: True | |
| If True (default), create a :py:class:`pandas.MultiIndex` from the Cartesian product | |
| of this dataset's indices. If False, use a :py:class:`pandas.RangeIndex` instead. | |
| This can be useful to avoid the potentially expensive MultiIndex | |
| creation. |
To avoid any confusion with xarray.indexes.RangeIndex (float range) and xarray.indexes.PandasMultiIndex.
|
Can you make this change for |
create_index: boolinto_dataframeto skip creating MultiIndex #10912Support [create_index: bool] in [to_dataframe()] to skip creating MultiIndex.
This PR adds a new create_index parameter to both Dataset.to_dataframe() and DataArray.to_dataframe() methods, allowing users to skip the potentially expensive MultiIndex creation and use a simple RangeIndex instead.