Skip to content

Commit d772523

Browse files
authored
Merge branch 'main' into fix-issue-58471
2 parents 0fa3118 + 865ae1d commit d772523

File tree

17 files changed

+357
-110
lines changed

17 files changed

+357
-110
lines changed

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -179,7 +179,7 @@ If you are simply looking to start working with the pandas codebase, navigate to
179179

180180
You can also triage issues which may include reproducing bug reports, or asking for vital information such as version numbers or reproduction instructions. If you would like to start triaging issues, one easy way to get started is to [subscribe to pandas on CodeTriage](https://www.codetriage.com/pandas-dev/pandas).
181181

182-
Or maybe through using pandas you have an idea of your own or are looking for something in the documentation and thinking ‘this can be improved’...you can do something about it!
182+
Or maybe through using pandas you have an idea of your own or are looking for something in the documentation and thinking ‘this can be improved’... you can do something about it!
183183

184184
Feel free to ask questions on the [mailing list](https://groups.google.com/forum/?fromgroups#!forum/pydata) or on [Slack](https://pandas.pydata.org/docs/dev/development/community.html?highlight=slack#community-slack).
185185

doc/source/user_guide/groupby.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -137,7 +137,7 @@ We could naturally group by either the ``A`` or ``B`` columns, or both:
137137

138138
``df.groupby('A')`` is just syntactic sugar for ``df.groupby(df['A'])``.
139139

140-
The above GroupBy will split the DataFrame on its index (rows). To split by columns, first do
140+
DataFrame groupby always operates along axis 0 (rows). To split by columns, first do
141141
a transpose:
142142

143143
.. ipython::

doc/source/whatsnew/v3.0.0.rst

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -739,6 +739,7 @@ Other Deprecations
739739
- Deprecated option "future.no_silent_downcasting", as it is no longer used. In a future version accessing this option will raise (:issue:`59502`)
740740
- Deprecated silent casting of non-datetime 'other' to datetime in :meth:`Series.combine_first` (:issue:`62931`)
741741
- Deprecated slicing on a :class:`Series` or :class:`DataFrame` with a :class:`DatetimeIndex` using a ``datetime.date`` object, explicitly cast to :class:`Timestamp` instead (:issue:`35830`)
742+
- Deprecated support for the Dataframe Interchange Protocol (:issue:`56732`)
742743
- Deprecated the 'inplace' keyword from :meth:`Resampler.interpolate`, as passing ``True`` raises ``AttributeError`` (:issue:`58690`)
743744

744745
.. ---------------------------------------------------------------------------
@@ -961,6 +962,7 @@ Categorical
961962
^^^^^^^^^^^
962963
- Bug in :class:`Categorical` where constructing from a pandas :class:`Series` or :class:`Index` with ``dtype='object'`` did not preserve the categories' dtype as ``object``; now the ``categories.dtype`` is preserved as ``object`` for these cases, while numpy arrays and Python sequences with ``dtype='object'`` continue to infer the most specific dtype (for example, ``str`` if all elements are strings) (:issue:`61778`)
963964
- Bug in :func:`Series.apply` where ``nan`` was ignored for :class:`CategoricalDtype` (:issue:`59938`)
965+
- Bug in :func:`bdate_range` raising ``ValueError`` with frequency ``freq="cbh"`` (:issue:`62849`)
964966
- Bug in :func:`testing.assert_index_equal` raising ``TypeError`` instead of ``AssertionError`` for incomparable ``CategoricalIndex`` when ``check_categorical=True`` and ``exact=False`` (:issue:`61935`)
965967
- Bug in :meth:`Categorical.astype` where ``copy=False`` would still trigger a copy of the codes (:issue:`62000`)
966968
- Bug in :meth:`DataFrame.pivot` and :meth:`DataFrame.set_index` raising an ``ArrowNotImplementedError`` for columns with pyarrow dictionary dtype (:issue:`53051`)
@@ -1180,6 +1182,7 @@ Groupby/resample/rolling
11801182
- Bug in :meth:`Rolling.apply` for ``method="table"`` where column order was not being respected due to the columns getting sorted by default. (:issue:`59666`)
11811183
- Bug in :meth:`Rolling.apply` where the applied function could be called on fewer than ``min_period`` periods if ``method="table"``. (:issue:`58868`)
11821184
- Bug in :meth:`Series.resample` could raise when the date range ended shortly before a non-existent time. (:issue:`58380`)
1185+
- Bug in :meth:`Series.resample` raising error when resampling non-nanosecond resolutions out of bounds for nanosecond precision (:issue:`57427`)
11831186

11841187
Reshaping
11851188
^^^^^^^^^

pandas/_libs/tslibs/offsets.pyx

Lines changed: 13 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -5688,18 +5688,27 @@ def shift_month(stamp: datetime, months: int, day_opt: object = None) -> datetim
56885688
cdef:
56895689
int year, month, day
56905690
int days_in_month, dy
5691+
npy_datetimestruct dts
5692+
5693+
if isinstance(stamp, _Timestamp):
5694+
creso = (<_Timestamp>stamp)._creso
5695+
val = (<_Timestamp>stamp)._value
5696+
pandas_datetime_to_datetimestruct(val, creso, &dts)
5697+
else:
5698+
# Plain datetime/date
5699+
pydate_to_dtstruct(stamp, &dts)
56915700

5692-
dy = (stamp.month + months) // 12
5693-
month = (stamp.month + months) % 12
5701+
dy = (dts.month + months) // 12
5702+
month = (dts.month + months) % 12
56945703

56955704
if month == 0:
56965705
month = 12
56975706
dy -= 1
5698-
year = stamp.year + dy
5707+
year = dts.year + dy
56995708

57005709
if day_opt is None:
57015710
days_in_month = get_days_in_month(year, month)
5702-
day = min(stamp.day, days_in_month)
5711+
day = min(dts.day, days_in_month)
57035712
elif day_opt == "start":
57045713
day = 1
57055714
elif day_opt == "end":

pandas/conftest.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -135,12 +135,14 @@ def pytest_collection_modifyitems(items, config) -> None:
135135
# Warnings from doctests that can be ignored; place reason in comment above.
136136
# Each entry specifies (path, message) - see the ignore_doctest_warning function
137137
ignored_doctest_warnings = [
138+
("api.interchange.from_dataframe", ".*Interchange Protocol is deprecated"),
138139
("is_int64_dtype", "is_int64_dtype is deprecated"),
139140
("is_interval_dtype", "is_interval_dtype is deprecated"),
140141
("is_period_dtype", "is_period_dtype is deprecated"),
141142
("is_datetime64tz_dtype", "is_datetime64tz_dtype is deprecated"),
142143
("is_categorical_dtype", "is_categorical_dtype is deprecated"),
143144
("is_sparse", "is_sparse is deprecated"),
145+
("DataFrame.__dataframe__", "Interchange Protocol is deprecated"),
144146
("DataFrameGroupBy.fillna", "DataFrameGroupBy.fillna is deprecated"),
145147
("DataFrameGroupBy.corrwith", "DataFrameGroupBy.corrwith is deprecated"),
146148
("NDFrame.replace", "Series.replace without 'value'"),

pandas/core/frame.py

Lines changed: 17 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -916,6 +916,14 @@ def __dataframe__(
916916
"""
917917
Return the dataframe interchange object implementing the interchange protocol.
918918
919+
.. deprecated:: 3.0.0
920+
921+
The Dataframe Interchange Protocol is deprecated.
922+
For dataframe-agnostic code, you may want to look into:
923+
924+
- `Arrow PyCapsule Interface <https://arrow.apache.org/docs/format/CDataInterface/PyCapsuleInterface.html>`_
925+
- `Narwhals <https://github.com/narwhals-dev/narwhals>`_
926+
919927
.. note::
920928
921929
For new development, we highly recommend using the Arrow C Data Interface
@@ -970,7 +978,14 @@ def __dataframe__(
970978
These methods (``column_names``, ``select_columns_by_name``) should work
971979
for any dataframe library which implements the interchange protocol.
972980
"""
973-
981+
warnings.warn(
982+
"The Dataframe Interchange Protocol is deprecated.\n"
983+
"For dataframe-agnostic code, you may want to look into:\n"
984+
"- Arrow PyCapsule Interface: https://arrow.apache.org/docs/format/CDataInterface/PyCapsuleInterface.html\n"
985+
"- Narwhals: https://github.com/narwhals-dev/narwhals\n",
986+
Pandas4Warning,
987+
stacklevel=find_stack_level(),
988+
)
974989
from pandas.core.interchange.dataframe import PandasDataFrameXchg
975990

976991
return PandasDataFrameXchg(self, allow_copy=allow_copy)
@@ -9430,7 +9445,7 @@ def groupby(
94309445
index. If a dict or Series is passed, the Series or dict VALUES
94319446
will be used to determine the groups (the Series' values are first
94329447
aligned; see ``.align()`` method). If a list or ndarray of length
9433-
equal to the selected axis is passed (see the `groupby user guide
9448+
equal to the number of rows is passed (see the `groupby user guide
94349449
<https://pandas.pydata.org/pandas-docs/stable/user_guide/groupby.html#splitting-an-object-into-groups>`_),
94359450
the values are used as-is to determine the groups. A label or list
94369451
of labels may be passed to group by the columns in ``self``.

pandas/core/indexes/datetimes.py

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1133,12 +1133,14 @@ def bdate_range(
11331133
msg = "freq must be specified for bdate_range; use date_range instead"
11341134
raise TypeError(msg)
11351135

1136-
if isinstance(freq, str) and freq.startswith("C"):
1136+
if isinstance(freq, str) and freq.upper().startswith("C"):
1137+
msg = f"invalid custom frequency string: {freq}"
1138+
if freq == "CBH":
1139+
raise ValueError(f"{msg}, did you mean cbh?")
11371140
try:
11381141
weekmask = weekmask or "Mon Tue Wed Thu Fri"
11391142
freq = prefix_mapping[freq](holidays=holidays, weekmask=weekmask)
11401143
except (KeyError, TypeError) as err:
1141-
msg = f"invalid custom frequency string: {freq}"
11421144
raise ValueError(msg) from err
11431145
elif holidays or weekmask:
11441146
msg = (

pandas/core/interchange/from_dataframe.py

Lines changed: 23 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,13 +6,16 @@
66
Any,
77
overload,
88
)
9+
import warnings
910

1011
import numpy as np
1112

1213
from pandas._config import using_string_dtype
1314

1415
from pandas.compat._optional import import_optional_dependency
16+
from pandas.errors import Pandas4Warning
1517
from pandas.util._decorators import set_module
18+
from pandas.util._exceptions import find_stack_level
1619

1720
import pandas as pd
1821
from pandas.core.interchange.dataframe_protocol import (
@@ -47,6 +50,9 @@ def from_dataframe(df, allow_copy: bool = True) -> pd.DataFrame:
4750
From pandas 3.0 onwards, `from_dataframe` uses the PyCapsule Interface,
4851
only falling back to the interchange protocol if that fails.
4952
53+
From pandas 4.0 onwards, that fallback will no longer be available and only
54+
the PyCapsule Interface will be used.
55+
5056
.. warning::
5157
5258
Due to severe implementation issues, we recommend only considering using the
@@ -99,7 +105,14 @@ def from_dataframe(df, allow_copy: bool = True) -> pd.DataFrame:
99105
pa = import_optional_dependency("pyarrow", min_version="14.0.0")
100106
except ImportError:
101107
# fallback to _from_dataframe
102-
pass
108+
warnings.warn(
109+
"Conversion using Arrow PyCapsule Interface failed due to "
110+
"missing PyArrow>=14 dependency, falling back to (deprecated) "
111+
"interchange protocol. We recommend that you install "
112+
"PyArrow>=14.0.0.",
113+
UserWarning,
114+
stacklevel=find_stack_level(),
115+
)
103116
else:
104117
try:
105118
return pa.table(df).to_pandas(zero_copy_only=not allow_copy)
@@ -109,6 +122,15 @@ def from_dataframe(df, allow_copy: bool = True) -> pd.DataFrame:
109122
if not hasattr(df, "__dataframe__"):
110123
raise ValueError("`df` does not support __dataframe__")
111124

125+
warnings.warn(
126+
"The Dataframe Interchange Protocol is deprecated.\n"
127+
"For dataframe-agnostic code, you may want to look into:\n"
128+
"- Arrow PyCapsule Interface: https://arrow.apache.org/docs/format/CDataInterface/PyCapsuleInterface.html\n"
129+
"- Narwhals: https://github.com/narwhals-dev/narwhals\n",
130+
Pandas4Warning,
131+
stacklevel=find_stack_level(),
132+
)
133+
112134
return _from_dataframe(
113135
df.__dataframe__(allow_copy=allow_copy), allow_copy=allow_copy
114136
)

pandas/tests/frame/methods/test_join.py

Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -575,3 +575,27 @@ def test_frame_join_tzaware(self):
575575

576576
tm.assert_index_equal(result.index, expected)
577577
assert result.index.tz.key == "US/Central"
578+
579+
def test_frame_join_categorical_index(self):
580+
# GH 61675
581+
cat_data = pd.Categorical(
582+
[3, 4],
583+
categories=pd.Series([2, 3, 4, 5], dtype="Int64"),
584+
ordered=True,
585+
)
586+
values1 = "a b".split()
587+
values2 = "foo bar".split()
588+
df1 = DataFrame({"hr": cat_data, "values1": values1}).set_index("hr")
589+
df2 = DataFrame({"hr": cat_data, "values2": values2}).set_index("hr")
590+
df1.columns = pd.CategoricalIndex([4], dtype=cat_data.dtype, name="other_hr")
591+
df2.columns = pd.CategoricalIndex([3], dtype=cat_data.dtype, name="other_hr")
592+
593+
df_joined = df1.join(df2)
594+
expected = DataFrame(
595+
{"hr": cat_data, "values1": values1, "values2": values2}
596+
).set_index("hr")
597+
expected.columns = pd.CategoricalIndex(
598+
[4, 3], dtype=cat_data.dtype, name="other_hr"
599+
)
600+
601+
tm.assert_frame_equal(df_joined, expected)

pandas/tests/indexes/datetimes/test_date_range.py

Lines changed: 34 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1216,7 +1216,7 @@ def test_cdaterange_holidays_weekmask_requires_freqstr(self):
12161216
)
12171217

12181218
@pytest.mark.parametrize(
1219-
"freq", [freq for freq in prefix_mapping if freq.startswith("C")]
1219+
"freq", [freq for freq in prefix_mapping if freq.upper().startswith("C")]
12201220
)
12211221
def test_all_custom_freq(self, freq):
12221222
# should not raise
@@ -1280,6 +1280,39 @@ def test_data_range_custombusinessday_partial_time(self, unit):
12801280
)
12811281
tm.assert_index_equal(result, expected)
12821282

1283+
def test_cdaterange_cbh(self):
1284+
# GH#62849
1285+
result = bdate_range(
1286+
"2009-03-13",
1287+
"2009-03-15",
1288+
freq="cbh",
1289+
weekmask="Mon Wed Fri",
1290+
holidays=["2009-03-14"],
1291+
)
1292+
expected = DatetimeIndex(
1293+
[
1294+
"2009-03-13 09:00:00",
1295+
"2009-03-13 10:00:00",
1296+
"2009-03-13 11:00:00",
1297+
"2009-03-13 12:00:00",
1298+
"2009-03-13 13:00:00",
1299+
"2009-03-13 14:00:00",
1300+
"2009-03-13 15:00:00",
1301+
"2009-03-13 16:00:00",
1302+
],
1303+
dtype="datetime64[ns]",
1304+
freq="cbh",
1305+
)
1306+
tm.assert_index_equal(result, expected)
1307+
1308+
def test_cdaterange_deprecated_error_CBH(self):
1309+
# GH#62849
1310+
msg = "invalid custom frequency string: CBH, did you mean cbh?"
1311+
with pytest.raises(ValueError, match=msg):
1312+
bdate_range(
1313+
START, END, freq="CBH", weekmask="Mon Wed Fri", holidays=["2009-03-14"]
1314+
)
1315+
12831316

12841317
class TestDateRangeNonNano:
12851318
def test_date_range_reso_validation(self):

0 commit comments

Comments
 (0)