Comments
I hope not - as a pandas user it had never occurred to me to pass The OP of #53426 demonstrates different floating point results with
Edit: The example below fails because it uses Code
|
I think #52042 means the DataFrameGroupBy cases here will be emitting FutureWarnings anyway, so affected users will need to update their code regardless. |
I was a bit incorrect in #53425 (comment); I've updated that comment (see the crossed out line) and added this as another example of not-great current behavior to the OP. |
I tried this: >>> df = pd.DataFrame({"a": [1,1,1,2,2], "b": [4,5,6,4,pd.NA]}, dtype="int32[pyarrow]")
>>> df.agg(np.sum, axis=0)
a 7
b 19
dtype: int64 Seems to work after removing calls to I think IK'm probably ok with it being deprecated though. |
Ack! My original example used |
After this deprecation has been enforced: Just a heads up that this needs to be fixed before v3.0. Implementing #54747 v2 (I will push today) will fix this though, but you should probably be aware of this just in case something stops #54747. |
@topper-123 - how are you concluding this? If I enforce the deprecation here as well as the one in #53325, then I get:
Am I doing something wrong? |
Currently various code paths detects whether a user passes e.g. Python's builtin
sum
ornp.sum
and replaces the operation with the pandas equivalent. I think we shouldn't alias a potentially valid UDF to the pandas version; this prevents users from using the NumPy version of the method if that is indeed what they want. This can lead to some surprising behavior: for example, NumPyddof
defaults to 0 whereas pandas defaults to 1. We also only do the switch when there are no args/kwargs provided. Thus:It can also lead to different floating point behavior:
However, using the Python builtin functions on DataFrames can give some surprising behavior:
If we remove special casing of these builtins, then apply/agg/transform will also exhibit this behavior. In particular, whether a DataFrame is broken up into Series internally to compute the operation will determine whether we get a result like
2
orb
above.However, I think this is still okay to do as the change to users is straightforward: use
"max"
instead ofmax
.I've opened #53426 to see what impact this would have on our test suite.
cc @topper-123 @mroeschke @jbrockmendel
The text was updated successfully, but these errors were encountered: