A MultiIndex (hierarchical index) labels rows or columns with multiple levels—useful after complex groupby, panel data, or stacked reports. Access with tuple keys and loc.
Creating MultiIndex
import pandas as pd
arrays = [['A', 'A', 'B'], ['x', 'y', 'x']]
idx = pd.MultiIndex.from_arrays(arrays, names=['grp', 'sub'])
df = pd.DataFrame({'val': [1, 2, 3]}, index=idx)
print(df)
Selection
print(df.loc['A']) # all rows where grp='A'
print(df.loc[('B', 'x')]) # single row
Flattening
df.reset_index() converts MultiIndex levels to columns—preferred before merge/plot when hierarchy is no longer needed.
Important interview questions and answers
- Q: MultiIndex columns?
A: Possible after pivot—use droplevel or reset_index to simplify. - Q: xs method?
A: Cross-section: df.xs('A', level='grp') selects one level value.
Self-check
- Create a DataFrame with two-level row index.
- Select all rows for one top-level key.
Tip: reset_index() flattens MultiIndex before merge or sklearn export.
Interview prep
- MultiIndex?
Hierarchical row/column labels—common after groupby/pivot.
- reset_index?
Flattens levels to columns for simpler downstream ops.