-
-
Notifications
You must be signed in to change notification settings - Fork 19.1k
BUG: Raise TypeError for mismatched signed/unsigned dtypes in IntervalIndex.from_arrays #62376
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…lIndex.from_arrays
@jbrockmendel please review these changes when you get a chance. |
pandas/core/indexes/interval.py
Outdated
copy: bool = False, | ||
dtype: Dtype | None = None, | ||
) -> IntervalIndex: | ||
# Check for mismatched signed/unsigned integer dtypes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think it would make more sense to do this on the IntervalArray.from_arrays method
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! I will update changes in IntervalArray.from_arrays
@jbrockmendel I’ve updated this as per your suggestions. |
pandas/core/arrays/interval.py
Outdated
) -> Self: | ||
# Check for mismatched signed/unsigned integer dtypes | ||
left_dtype = getattr(left, "dtype", None) | ||
right_dtype = getattr(right, "dtype", None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think putting this after the _maybe_convert_platform_interval calls would be more robust. e.g. if one is a list and the other is uint64?
Also is it just int vs uint we care about, or also e.g. int32 vs int64?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are checking int vs unit .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there a reason not to move this to after the _maybe_convert_platform_interval calls?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no, i have updated, also added whatsnew note, please let me know if this needs improvement.
pandas/core/arrays/interval.py
Outdated
|
||
# Check for mismatched signed/unsigned integer dtypes | ||
left_dtype = getattr(left, "dtype", None) | ||
right_dtype = getattr(right, "dtype", None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the getattr should be unnecessary. the attribute should always be there now that this is moved to after
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks, got it. i have updated the code and removed none check for dtype.
@jbrockmendel could you please check if this CI failure is related to the changes in this PR?
|
c756fbc
to
5b67c6e
Compare
I'm pretty sure that's affecting all PRs and is unrelated to this. |
pandas/core/arrays/interval.py
Outdated
if ( | ||
left_dtype.kind in "iu" | ||
and right_dtype.kind in "iu" | ||
and left_dtype.kind != right_dtype.kind |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can't we just compare if left.dtype != right.dtype
at this point?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, this will be more clear . i will update with this .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jbrockmendel i guess we should not use if left.dtype != right.dtype
as this will only restrict to int64 vs uint64
and cause failure in other cases . should i revert the changes to previous one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what cases? im pretty sure we always want matching dtypes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
=========================== short test summary info ============================
FAILED pandas/tests/indexes/categorical/test_astype.py::TestAstype::test_astype - TypeError: Left and right arrays must have matching dtypes. Got float64 and int64.
FAILED pandas/tests/indexes/interval/test_constructors.py::TestFromArrays::test_mixed_float_int[int64-float64] - TypeError: Left and right arrays must have matching dtypes. Got int64 and float64.
FAILED pandas/tests/indexes/interval/test_constructors.py::TestFromArrays::test_mixed_float_int[float64-int64] - TypeError: Left and right arrays must have matching dtypes. Got float64 and int64.
FAILED
got these these check fails, after applying changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like it is casting mixed int/float to float/float in ensure_simple_new_inputs. So putting this check after that should do the trick
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure, that make sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just to confirm before applying changes , i should move these check after ensure_simple_new_inputs
, or should I use a strict if left.dtype != right.dtype
check after that step?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think so yes
dtype=dtype, | ||
) | ||
|
||
# Check for mismatched signed/unsigned integer dtypes after casting |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
im confused as to why this affects from_arrays, since that goes through simple_new and not __new__
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure but could it be as in simple_new- IntervalMixin.__new__
is called not IntervalArray.__new__
.
This PR adds a check in I
ntervalIndex.from_arrays
to raise aTypeError
when integer arrays have mismatched signedness. Includes a unit test to verify the behavior.Please let me know if my approach or fix needs any improvements . I’m open to feedback and happy to make changes based on suggestions.
Thankyou !