Create columns from existing data: arithmetic, string ops, conditional logic with np.where, or vectorized functions. Prefer vectorized operations over apply when possible for speed.
Arithmetic and assign
import pandas as pd
import numpy as np
df = pd.DataFrame({'price': [10, 20], 'qty': [2, 3]})
df = df.assign(revenue=df['price'] * df['qty'])
print(df)
Conditional columns
df['tier'] = np.where(df['revenue'] >= 50, 'high', 'low')
Patterns to avoid
- Python loops over rows — slow; use vectorization
applyon every cell when ufuncs suffice- Chained assignment without
loc
Important interview questions and answers
- Q: assign vs direct?
A: assign returns new DataFrame and allows method chaining—cleaner pipelines. - Q: np.where vs apply?
A: np.where is vectorized and faster for simple if/else on columns.
Self-check
- Add a revenue column from price × qty.
- Create a tier column with np.where.
Tip: Prefer assign and np.where over row-wise apply for speed.
Interview prep
- np.where?
Vectorized if/else for column creation—faster than apply.
- Vectorize why?
Delegates to NumPy C loops—avoid Python row iteration.