Skip to content

Commit 1d9e25e

Browse files
authored
Merge branch 'main' into docs/excel-header-bold-styler
2 parents 03ea90f + 82fa271 commit 1d9e25e

File tree

10 files changed

+123
-30
lines changed

10 files changed

+123
-30
lines changed

doc/source/whatsnew/v3.0.0.rst

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -737,6 +737,7 @@ Other Deprecations
737737
- Deprecated allowing ``fill_value`` that cannot be held in the original dtype (excepting NA values for integer and bool dtypes) in :meth:`Series.shift` and :meth:`DataFrame.shift` (:issue:`53802`)
738738
- Deprecated backward-compatibility behavior for :meth:`DataFrame.select_dtypes` matching "str" dtype when ``np.object_`` is specified (:issue:`61916`)
739739
- Deprecated option "future.no_silent_downcasting", as it is no longer used. In a future version accessing this option will raise (:issue:`59502`)
740+
- Deprecated silent casting of non-datetime 'other' to datetime in :meth:`Series.combine_first` (:issue:`62931`)
740741
- Deprecated slicing on a :class:`Series` or :class:`DataFrame` with a :class:`DatetimeIndex` using a ``datetime.date`` object, explicitly cast to :class:`Timestamp` instead (:issue:`35830`)
741742
- Deprecated the 'inplace' keyword from :meth:`Resampler.interpolate`, as passing ``True`` raises ``AttributeError`` (:issue:`58690`)
742743

@@ -974,13 +975,15 @@ Datetimelike
974975
- Bug in :class:`Timestamp` constructor failing to raise when given a ``np.datetime64`` object with non-standard unit (:issue:`25611`)
975976
- Bug in :func:`date_range` where the last valid timestamp would sometimes not be produced (:issue:`56134`)
976977
- Bug in :func:`date_range` where using a negative frequency value would not include all points between the start and end values (:issue:`56147`)
978+
- Bug in :func:`infer_freq` with a :class:`Series` with :class:`ArrowDtype` timestamp dtype incorrectly raising ``TypeError`` (:issue:`58403`)
977979
- Bug in :func:`to_datetime` where passing an ``lxml.etree._ElementUnicodeResult`` together with ``format`` raised ``TypeError``. Now subclasses of ``str`` are handled. (:issue:`60933`)
978980
- Bug in :func:`tseries.api.guess_datetime_format` would fail to infer time format when "%Y" == "%H%M" (:issue:`57452`)
979981
- Bug in :func:`tseries.frequencies.to_offset` would fail to parse frequency strings starting with "LWOM" (:issue:`59218`)
980982
- Bug in :meth:`DataFrame.fillna` raising an ``AssertionError`` instead of ``OutOfBoundsDatetime`` when filling a ``datetime64[ns]`` column with an out-of-bounds timestamp. Now correctly raises ``OutOfBoundsDatetime``. (:issue:`61208`)
981983
- Bug in :meth:`DataFrame.min` and :meth:`DataFrame.max` casting ``datetime64`` and ``timedelta64`` columns to ``float64`` and losing precision (:issue:`60850`)
982984
- Bug in :meth:`Dataframe.agg` with df with missing values resulting in IndexError (:issue:`58810`)
983985
- Bug in :meth:`DateOffset.rollback` (and subclass methods) with ``normalize=True`` rolling back one offset too long (:issue:`32616`)
986+
- Bug in :meth:`DatetimeIndex.asof` with a string key giving incorrect results (:issue:`50946`)
984987
- Bug in :meth:`DatetimeIndex.is_year_start` and :meth:`DatetimeIndex.is_quarter_start` does not raise on Custom business days frequencies bigger then "1C" (:issue:`58664`)
985988
- Bug in :meth:`DatetimeIndex.is_year_start` and :meth:`DatetimeIndex.is_quarter_start` returning ``False`` on double-digit frequencies (:issue:`58523`)
986989
- Bug in :meth:`DatetimeIndex.union` and :meth:`DatetimeIndex.intersection` when ``unit`` was non-nanosecond (:issue:`59036`)
@@ -1183,10 +1186,13 @@ Reshaping
11831186
- Bug in :func:`concat` with mixed integer and bool dtypes incorrectly casting the bools to integers (:issue:`45101`)
11841187
- Bug in :func:`qcut` where values at the quantile boundaries could be incorrectly assigned (:issue:`59355`)
11851188
- Bug in :meth:`DataFrame.combine_first` not preserving the column order (:issue:`60427`)
1189+
- Bug in :meth:`DataFrame.combine_first` with non-unique columns incorrectly raising (:issue:`29135`)
1190+
- Bug in :meth:`DataFrame.combine` with non-unique columns incorrectly raising (:issue:`51340`)
11861191
- Bug in :meth:`DataFrame.explode` producing incorrect result for :class:`pyarrow.large_list` type (:issue:`61091`)
11871192
- Bug in :meth:`DataFrame.join` inconsistently setting result index name (:issue:`55815`)
11881193
- Bug in :meth:`DataFrame.join` when a :class:`DataFrame` with a :class:`MultiIndex` would raise an ``AssertionError`` when :attr:`MultiIndex.names` contained ``None``. (:issue:`58721`)
11891194
- Bug in :meth:`DataFrame.merge` where merging on a column containing only ``NaN`` values resulted in an out-of-bounds array access (:issue:`59421`)
1195+
- Bug in :meth:`Series.combine_first` incorrectly replacing ``None`` entries with ``NaN`` (:issue:`58977`)
11901196
- Bug in :meth:`DataFrame.unstack` producing incorrect results when ``sort=False`` (:issue:`54987`, :issue:`55516`)
11911197
- Bug in :meth:`DataFrame.unstack` raising an error with indexes containing ``NaN`` with ``sort=False`` (:issue:`61221`)
11921198
- Bug in :meth:`DataFrame.merge` when merging two :class:`DataFrame` on ``intc`` or ``uintc`` types on Windows (:issue:`60091`, :issue:`58713`)

pandas/core/frame.py

Lines changed: 19 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -9038,16 +9038,6 @@ def combine(
90389038
0 0 -5.0
90399039
1 0 4.0
90409040
9041-
However, if the same element in both dataframes is None, that None
9042-
is preserved
9043-
9044-
>>> df1 = pd.DataFrame({"A": [0, 0], "B": [None, 4]})
9045-
>>> df2 = pd.DataFrame({"A": [1, 1], "B": [None, 3]})
9046-
>>> df1.combine(df2, take_smaller, fill_value=-5)
9047-
A B
9048-
0 0 -5.0
9049-
1 0 3.0
9050-
90519041
Example that demonstrates the use of `overwrite` and behavior when
90529042
the axis differ between the dataframes.
90539043
@@ -9106,11 +9096,14 @@ def combine(
91069096

91079097
# preserve column order
91089098
new_columns = self.columns.union(other_columns, sort=False)
9099+
this = this.reindex(new_columns, axis=1)
9100+
other = other.reindex(new_columns, axis=1)
9101+
91099102
do_fill = fill_value is not None
91109103
result = {}
9111-
for col in new_columns:
9112-
series = this[col]
9113-
other_series = other[col]
9104+
for i in range(this.shape[1]):
9105+
series = this.iloc[:, i]
9106+
other_series = other.iloc[:, i]
91149107

91159108
this_dtype = series.dtype
91169109
other_dtype = other_series.dtype
@@ -9121,7 +9114,7 @@ def combine(
91219114
# don't overwrite columns unnecessarily
91229115
# DO propagate if this column is not in the intersection
91239116
if not overwrite and other_mask.all():
9124-
result[col] = this[col].copy()
9117+
result[i] = series.copy()
91259118
continue
91269119

91279120
if do_fill:
@@ -9130,7 +9123,7 @@ def combine(
91309123
series[this_mask] = fill_value
91319124
other_series[other_mask] = fill_value
91329125

9133-
if col not in self.columns:
9126+
if new_columns[i] not in self.columns:
91349127
# If self DataFrame does not have col in other DataFrame,
91359128
# try to promote series, which is all NaN, as other_dtype.
91369129
new_dtype = other_dtype
@@ -9155,10 +9148,10 @@ def combine(
91559148
arr, new_dtype
91569149
)
91579150

9158-
result[col] = arr
9151+
result[i] = arr
91599152

9160-
# convert_objects just in case
9161-
frame_result = self._constructor(result, index=new_index, columns=new_columns)
9153+
frame_result = self._constructor(result, index=new_index)
9154+
frame_result.columns = new_columns
91629155
return frame_result.__finalize__(self, method="combine")
91639156

91649157
def combine_first(self, other: DataFrame) -> DataFrame:
@@ -9222,9 +9215,14 @@ def combiner(x: Series, y: Series):
92229215
combined = self.combine(other, combiner, overwrite=False)
92239216

92249217
dtypes = {
9218+
# Check for isinstance(..., (np.dtype, ExtensionDtype))
9219+
# to prevent raising on non-unique columns see GH#29135.
9220+
# Note we will just not-cast in these cases.
92259221
col: find_common_type([self.dtypes[col], other.dtypes[col]])
92269222
for col in self.columns.intersection(other.columns)
9227-
if combined.dtypes[col] != self.dtypes[col]
9223+
if isinstance(combined.dtypes[col], (np.dtype, ExtensionDtype))
9224+
and isinstance(self.dtypes[col], (np.dtype, ExtensionDtype))
9225+
and combined.dtypes[col] != self.dtypes[col]
92289226
}
92299227

92309228
if dtypes:
@@ -13820,8 +13818,8 @@ def quantile(
1382013818
0.1 1 1
1382113819
0.5 3 100
1382213820
13823-
Specifying `numeric_only=False` will also compute the quantile of
13824-
datetime and timedelta data.
13821+
Specifying `numeric_only=False` will compute the quantiles for all
13822+
columns.
1382513823
1382613824
>>> df = pd.DataFrame(
1382713825
... {

pandas/core/indexes/base.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -4168,7 +4168,7 @@ def reindex(
41684168
limit : int, optional
41694169
Maximum number of consecutive labels in ``target`` to match for
41704170
inexact matches.
4171-
tolerance : int or float, optional
4171+
tolerance : int, float, or list-like, optional
41724172
Maximum distance between original and new labels for inexact
41734173
matches. The values of the index at the matching locations must
41744174
satisfy the equation ``abs(index[indexer] - target) <= tolerance``.
@@ -5675,7 +5675,7 @@ def asof(self, label):
56755675
return self._na_value
56765676
else:
56775677
if isinstance(loc, slice):
5678-
loc = loc.indices(len(self))[-1]
5678+
return self[loc][-1]
56795679

56805680
return self[loc]
56815681

pandas/core/series.py

Lines changed: 12 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -87,7 +87,6 @@
8787
)
8888
from pandas.core.dtypes.dtypes import (
8989
ExtensionDtype,
90-
SparseDtype,
9190
)
9291
from pandas.core.dtypes.generic import (
9392
ABCDataFrame,
@@ -3112,8 +3111,8 @@ def combine(
31123111
31133112
Combine the Series and `other` using `func` to perform elementwise
31143113
selection for combined Series.
3115-
`fill_value` is assumed when value is missing at some index
3116-
from one of the two objects being combined.
3114+
`fill_value` is assumed when value is not present at some index
3115+
from one of the two Series being combined.
31173116
31183117
Parameters
31193118
----------
@@ -3254,9 +3253,6 @@ def combine_first(self, other) -> Series:
32543253
if self.dtype == other.dtype:
32553254
if self.index.equals(other.index):
32563255
return self.mask(self.isna(), other)
3257-
elif self._can_hold_na and not isinstance(self.dtype, SparseDtype):
3258-
this, other = self.align(other, join="outer")
3259-
return this.mask(this.isna(), other)
32603256

32613257
new_index = self.index.union(other.index)
32623258

@@ -3271,6 +3267,16 @@ def combine_first(self, other) -> Series:
32713267
if this.dtype.kind == "M" and other.dtype.kind != "M":
32723268
# TODO: try to match resos?
32733269
other = to_datetime(other)
3270+
warnings.warn(
3271+
# GH#62931
3272+
"Silently casting non-datetime 'other' to datetime in "
3273+
"Series.combine_first is deprecated and will be removed "
3274+
"in a future version. Explicitly cast before calling "
3275+
"combine_first instead.",
3276+
Pandas4Warning,
3277+
stacklevel=find_stack_level(),
3278+
)
3279+
32743280
combined = concat([this, other])
32753281
combined = combined.reindex(new_index)
32763282
return combined.__finalize__(self, method="combine_first")

pandas/tests/frame/methods/test_combine.py

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -45,3 +45,19 @@ def test_combine_generic(self, float_frame):
4545
)
4646
tm.assert_frame_equal(chunk, exp)
4747
tm.assert_frame_equal(chunk2, exp)
48+
49+
def test_combine_nonunique_columns(self):
50+
# GH#51340
51+
52+
df = pd.DataFrame({"A": range(5), "B": range(5)})
53+
df.columns = ["A", "A"]
54+
55+
other = df.copy()
56+
df.iloc[1, :] = None
57+
58+
def combiner(a, b):
59+
return b
60+
61+
result = df.combine(other, combiner)
62+
expected = other.astype("float64")
63+
tm.assert_frame_equal(result, expected)

pandas/tests/frame/methods/test_combine_first.py

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -413,6 +413,18 @@ def test_combine_first_preserve_EA_precision(self, wide_val, dtype):
413413
expected = DataFrame({"A": [wide_val, 5, wide_val]}, dtype=dtype)
414414
tm.assert_frame_equal(result, expected)
415415

416+
def test_combine_first_non_unique_columns(self):
417+
# GH#29135
418+
df1 = DataFrame([[1, np.nan], [3, 4]], columns=["P", "Q"], index=["A", "B"])
419+
df2 = DataFrame(
420+
[[5, 6, 7], [8, 9, np.nan]], columns=["P", "Q", "Q"], index=["A", "B"]
421+
)
422+
result = df1.combine_first(df2)
423+
expected = DataFrame(
424+
[[1, 6.0, 7.0], [3, 4.0, 4.0]], index=["A", "B"], columns=["P", "Q", "Q"]
425+
)
426+
tm.assert_frame_equal(result, expected)
427+
416428

417429
@pytest.mark.parametrize(
418430
"scalar1, scalar2",

pandas/tests/indexes/datetimes/methods/test_asof.py

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,7 @@
11
from datetime import timedelta
22

33
from pandas import (
4+
DatetimeIndex,
45
Index,
56
Timestamp,
67
date_range,
@@ -28,3 +29,18 @@ def test_asof(self):
2829

2930
dt = index[0].to_pydatetime()
3031
assert isinstance(index.asof(dt), Timestamp)
32+
33+
def test_asof_datetime_string(self):
34+
# GH#50946
35+
36+
dti = date_range("2021-08-05", "2021-08-10", freq="1D")
37+
38+
key = "2021-08-09"
39+
res = dti.asof(key)
40+
exp = dti[4]
41+
assert res == exp
42+
43+
# add a non-midnight time caused a bug
44+
dti2 = DatetimeIndex(list(dti) + ["2021-08-11 00:00:01"])
45+
res = dti2.asof(key)
46+
assert res == exp

pandas/tests/series/methods/test_combine_first.py

Lines changed: 17 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,8 @@
22

33
import numpy as np
44

5+
from pandas.errors import Pandas4Warning
6+
57
import pandas as pd
68
from pandas import (
79
Period,
@@ -75,9 +77,14 @@ def test_combine_first_dt64(self, unit):
7577
xp = to_datetime(Series(["2010", "2011"])).dt.as_unit(unit)
7678
tm.assert_series_equal(rs, xp)
7779

80+
def test_combine_first_dt64_casting_deprecation(self, unit):
81+
# GH#62931
7882
s0 = to_datetime(Series(["2010", np.nan])).dt.as_unit(unit)
7983
s1 = Series([np.nan, "2011"])
80-
rs = s0.combine_first(s1)
84+
85+
msg = "Silently casting non-datetime 'other' to datetime"
86+
with tm.assert_produces_warning(Pandas4Warning, match=msg):
87+
rs = s0.combine_first(s1)
8188

8289
xp = Series([datetime(2010, 1, 1), "2011"], dtype=f"datetime64[{unit}]")
8390

@@ -144,3 +151,12 @@ def test_combine_mixed_timezone(self):
144151
),
145152
)
146153
tm.assert_series_equal(result, expected)
154+
155+
def test_combine_first_none_not_nan(self):
156+
# GH#58977
157+
s1 = Series([None, None, None], index=["a", "b", "c"])
158+
s2 = Series([None, None, None], index=["b", "c", "d"])
159+
160+
result = s1.combine_first(s2)
161+
expected = Series([None] * 4, index=["a", "b", "c", "d"])
162+
tm.assert_series_equal(result, expected)

pandas/tests/tseries/frequencies/test_inference.py

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,7 @@
1313
from pandas._libs.tslibs.offsets import _get_offset
1414
from pandas._libs.tslibs.period import INVALID_FREQ_ERR_MSG
1515
from pandas.compat import is_platform_windows
16+
import pandas.util._test_decorators as td
1617

1718
from pandas import (
1819
DatetimeIndex,
@@ -542,3 +543,16 @@ def test_infer_freq_non_nano_tzaware(tz_aware_fixture):
542543

543544
res = frequencies.infer_freq(dta)
544545
assert res == "B"
546+
547+
548+
@td.skip_if_no("pyarrow")
549+
def test_infer_freq_pyarrow():
550+
# GH#58403
551+
data = ["2022-01-01T10:00:00", "2022-01-01T10:00:30", "2022-01-01T10:01:00"]
552+
pd_series = Series(data).astype("timestamp[s][pyarrow]")
553+
pd_index = Index(data).astype("timestamp[s][pyarrow]")
554+
555+
assert frequencies.infer_freq(pd_index.values) == "30s"
556+
assert frequencies.infer_freq(pd_series.values) == "30s"
557+
assert frequencies.infer_freq(pd_index) == "30s"
558+
assert frequencies.infer_freq(pd_series) == "30s"

pandas/tseries/frequencies.py

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -37,6 +37,7 @@
3737

3838
from pandas.core.dtypes.common import is_numeric_dtype
3939
from pandas.core.dtypes.dtypes import (
40+
ArrowDtype,
4041
DatetimeTZDtype,
4142
PeriodDtype,
4243
)
@@ -132,6 +133,14 @@ def infer_freq(
132133

133134
if isinstance(index, ABCSeries):
134135
values = index._values
136+
137+
if isinstance(index.dtype, ArrowDtype):
138+
import pyarrow as pa
139+
140+
if pa.types.is_timestamp(values.dtype.pyarrow_dtype):
141+
# GH#58403
142+
values = values._to_datetimearray()
143+
135144
if not (
136145
lib.is_np_dtype(values.dtype, "mM")
137146
or isinstance(values.dtype, DatetimeTZDtype)

0 commit comments

Comments
 (0)