Skip to content

Conversation

@dhruvak001
Copy link
Contributor

@dhruvak001 dhruvak001 commented Dec 3, 2025

Support [create_index: bool] in [to_dataframe()] to skip creating MultiIndex.

This PR adds a new create_index parameter to both Dataset.to_dataframe() and DataArray.to_dataframe() methods, allowing users to skip the potentially expensive MultiIndex creation and use a simple RangeIndex instead.

Copilot AI review requested due to automatic review settings December 3, 2025 21:50
Copilot finished reviewing on behalf of dhruvak001 December 3, 2025 21:53
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request adds a create_index parameter to both Dataset.to_dataframe() and DataArray.to_dataframe() methods, allowing users to bypass the potentially expensive MultiIndex creation and use a simple RangeIndex instead. This is a performance optimization feature that addresses issue #10912.

  • Adds optional create_index: bool = True parameter to maintain backward compatibility
  • When create_index=False, uses pd.RangeIndex instead of constructing a MultiIndex from coordinates
  • Preserves data integrity and ordering while avoiding the MultiIndex overhead

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.

File Description
xarray/core/dataset.py Adds create_index parameter to to_dataframe() and _to_dataframe() methods, implementing RangeIndex creation when parameter is False
xarray/core/dataarray.py Adds create_index parameter to to_dataframe() method and passes it through to Dataset's _to_dataframe()
xarray/tests/test_dataset.py Adds comprehensive tests for the new create_index parameter on Dataset, including default behavior, RangeIndex creation, data integrity, and interaction with dim_order
xarray/tests/test_dataarray.py Adds comprehensive tests for the new create_index parameter on DataArray, including default behavior, RangeIndex creation, data integrity, and interaction with additional coordinates
Comments suppressed due to low confidence (1)

xarray/core/dataset.py:7277

  • The docstring states "The DataFrame is indexed by the Cartesian product of this dataset's indices" but this is no longer true when create_index=False. Consider updating the first paragraph of the docstring to clarify this behavior, e.g., "When create_index=True (default), the DataFrame is indexed by the Cartesian product of this dataset's indices. When create_index=False, a simple RangeIndex is used instead."
        Non-index variables in this dataset form the columns of the
        DataFrame. The DataFrame is indexed by the Cartesian product of
        this dataset's indices.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Contributor

@dcherian dcherian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, can you add a note to whats-new.rst please

@dcherian dcherian changed the title Support create_index: bool in to_dataframe() to skip creating MultiIndex Support create_index: bool in to_dataframe() to skip creating MultiIndex Dec 4, 2025
Copy link
Member

@benbovy benbovy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @dhruvak001. I just left one small comment on the docstrings.

Comment on lines +7293 to +7297
create_index : bool, default: True
If True (default), create a MultiIndex from the Cartesian product
of this dataset's indices. If False, use a RangeIndex instead.
This can be useful to avoid the potentially expensive MultiIndex
creation.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
create_index : bool, default: True
If True (default), create a MultiIndex from the Cartesian product
of this dataset's indices. If False, use a RangeIndex instead.
This can be useful to avoid the potentially expensive MultiIndex
creation.
create_index : bool, default: True
If True (default), create a :py:class:`pandas.MultiIndex` from the Cartesian product
of this dataset's indices. If False, use a :py:class:`pandas.RangeIndex` instead.
This can be useful to avoid the potentially expensive MultiIndex
creation.

To avoid any confusion with xarray.indexes.RangeIndex (float range) and xarray.indexes.PandasMultiIndex.

@dcherian
Copy link
Contributor

dcherian commented Dec 4, 2025

Can you make this change for to_dask_dataframe too please?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support create_index: bool in to_dataframe to skip creating MultiIndex

4 participants