Skip to content

Conversation

@matiaslindgren
Copy link
Contributor

@matiaslindgren matiaslindgren commented Oct 24, 2025

The bug reported in #61675 was fixed in 3.x by #58043. This PR adds a test based on the sample code in the issue description.

@matiaslindgren
Copy link
Contributor Author

@rhshadrach this is the test from #62781

Copy link
Member

@rhshadrach rhshadrach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR! Some minor requests.

# GH 61675
cat_data = pd.Categorical(
[15, 16, 17, 18],
categories=pd.Series(list(range(3, 24)), dtype="Int64"),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you reduce the size here; even 20 elements can make stepping through with a debugger harder. Shoot for 3 or 5 if necessary.

Comment on lines 593 to 597
df_joined_1 = (
df1.reset_index(level="hr")
.merge(df2.reset_index(level="hr"), on="hr")
.set_index("hr")
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are three method calls, it isn't clear what you're testing here. Can you move what you need to into the setup and have just a single method call here? Part of this may be dropping the last .set_index altogether - ideally we check the output directly from the function we want to test (merge here, I think) and do not modify the result.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I realized this test is redundant, sorry for the noise. I wanted to include the examples from the original GH issue unchanged, but since DataFrame.merge was a given as a suggested workaround for the bug we are testing, I think we can skip it and just test the join.


tm.assert_frame_equal(df_joined_1, expected1)

df_joined_2 = df1.join(df2)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you separate this out to a second test.


tm.assert_frame_equal(df_joined_2, expected2)

assert df_joined_1.equals(df_joined_2)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is necessary. assert_frame_equal is checking the full state of the objects already.

@matiaslindgren
Copy link
Contributor Author

Thanks for the review, I cleaned up the test.

Copy link
Member

@rhshadrach rhshadrach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@rhshadrach rhshadrach added Reshaping Concat, Merge/Join, Stack/Unstack, Explode Needs Tests Unit test(s) needed to prevent regressions labels Nov 1, 2025
@rhshadrach rhshadrach added this to the 3.0 milestone Nov 1, 2025
@rhshadrach rhshadrach changed the title TST: add test for issue #61675 TST: Add test DataFrame.join with CategoricalIndex Nov 1, 2025
@rhshadrach rhshadrach merged commit 54ab806 into pandas-dev:main Nov 1, 2025
51 of 52 checks passed
@rhshadrach
Copy link
Member

Thanks @matiaslindgren! The patch that fixed the bug has already been backported to 2.3.x, so we should be all set on the issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Needs Tests Unit test(s) needed to prevent regressions Reshaping Concat, Merge/Join, Stack/Unstack, Explode

Projects

None yet

Development

Successfully merging this pull request may close these issues.

BUG: DataFrame.join(other) raises InvalidIndexError if column index is CategoricalIndex

2 participants