-
-
Notifications
You must be signed in to change notification settings - Fork 19.2k
TST: Add test DataFrame.join with CategoricalIndex #62812
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@rhshadrach this is the test from #62781 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR! Some minor requests.
| # GH 61675 | ||
| cat_data = pd.Categorical( | ||
| [15, 16, 17, 18], | ||
| categories=pd.Series(list(range(3, 24)), dtype="Int64"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you reduce the size here; even 20 elements can make stepping through with a debugger harder. Shoot for 3 or 5 if necessary.
| df_joined_1 = ( | ||
| df1.reset_index(level="hr") | ||
| .merge(df2.reset_index(level="hr"), on="hr") | ||
| .set_index("hr") | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are three method calls, it isn't clear what you're testing here. Can you move what you need to into the setup and have just a single method call here? Part of this may be dropping the last .set_index altogether - ideally we check the output directly from the function we want to test (merge here, I think) and do not modify the result.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, I realized this test is redundant, sorry for the noise. I wanted to include the examples from the original GH issue unchanged, but since DataFrame.merge was a given as a suggested workaround for the bug we are testing, I think we can skip it and just test the join.
|
|
||
| tm.assert_frame_equal(df_joined_1, expected1) | ||
|
|
||
| df_joined_2 = df1.join(df2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you separate this out to a second test.
|
|
||
| tm.assert_frame_equal(df_joined_2, expected2) | ||
|
|
||
| assert df_joined_1.equals(df_joined_2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is necessary. assert_frame_equal is checking the full state of the objects already.
|
Thanks for the review, I cleaned up the test. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
|
Thanks @matiaslindgren! The patch that fixed the bug has already been backported to 2.3.x, so we should be all set on the issue. |
doc/source/whatsnew/vX.X.X.rstfile if fixing a bug or adding a new feature.The bug reported in #61675 was fixed in 3.x by #58043. This PR adds a test based on the sample code in the issue description.