Closed
@jbrockmendel

Description

The 'level' argument in MultiIndex methods can be ambiguous in at least 2 ways.

  1. If an index's level names are [1, 0], then level=0 may refer to either level depending on whether it is a position or a label API: unclear what integer level name references: name or position? #21677.

  2. If an index level names are ["foo", "bar", ("foo, "bar")], then level=("foo", "bar") may refer to either the first two levels, or the last level Index.droplevel when names are tuples #21120

One option to address the first one is adding keywords to specify e.g. level_num or level_name #10461

The new idea I'd like to discuss is having a pd.Label class to treat something ambiguous as a label. So in the first case, passing level=pd.Label(0) would indicate that you want the second level. In the second case, level=pd.Label(("foo", "bar")) would indicate that this should be interpreted as a single label and not a sequence of labels.

Following an appropriate deprecation cycle, the default would be to interpret ambiguous cases as positional/sequence unless pd.Label is explicitly used.

pd.Label has a couple of other potential uses

A) If frame.index (or series.index.levels[0]) contains tuples, then df.loc[a, b] may either be selecting a single key (a, b) from the index, or may be selecting a from the index and b from the columns. df.loc[pd.Label((a, b)) would disambiguate

B) in __getitem__/__setitem__ pd.Label could indicate to use .loc and not fallback to .iloc (not that useful since the user could just use .loc directly)