Add this suggestion to a batch that can be applied as a single commit. This suggestion is invalid because no changes were made to the code. Suggestions cannot be applied while the pull request is closed. Suggestions cannot be applied while viewing a subset of changes. Only one suggestion per line can be applied in a batch. Add this suggestion to a batch that can be applied as a single commit. Applying suggestions on deleted lines is not supported. You must change the existing code in this line in order to create a valid suggestion. Outdated suggestions cannot be applied. This suggestion has been applied or marked resolved. Suggestions cannot be applied from pending reviews. Suggestions cannot be applied on multi-line comments. Suggestions cannot be applied while the pull request is queued to merge. Suggestion cannot be applied right now. Please check back later.
df.agg
call passes different things to a custom function depending on whether a unused kwarg is supplied or not #39169Built on top of #57671; the diff should get better once that's merged. Still plan on splitting part of this up as a precursor (and perhaps multiple).
For the closed issues above, tests here still likely need to be added.
The goal here is to make groupby.agg more consistently handle UDFs. Currently:
My opinion is that we should treat all UDFs as reducers, regardless of what they return. Some alternatives:
For 1, we will sometimes guess wrong, and transforming isn't something we should be doing in a method called
agg
anyways. For 2, we are restricting what I think are valid use cases for aggregation, e.g.gb.agg(np.array)
orgb.agg(list)
.In implementing this, I ran into two issues:
_aggregate_frame
fails if non-scalars are returned by the UDF, and also passes all of the selected columns as a DataFrame to the UDF. This is called when there is a single grouping and args or kwargs are provided, or when there is a single grouping and passing the UDF each column individually fails with a ValueError("No objects to concatenate"). This does not seem possible to fix, would be hard to deprecate (could add a new argument or use afuture
option?), and is bad enough behavior that it seems to me we should just rip the band aid here for 3.0.Resampler.apply
is an alias forResample.agg
, and we do not want to impactResampler.apply
with these changes. For this, I kept the old paths through groupby specifically for resampler and plan to properly deprecate the current method and implement apply (by calling groupby's apply) as part of 3.x development. (Ref: BUG: resample apply is actually aggregate #38463)