Text columns use the .str accessor for vectorized string operations: lower/upper, strip, contains, split, and extract—without slow Python loops.
Common .str methods
s.str.lower(),s.str.upper(),s.str.strip()s.str.contains('pattern')— boolean masks.str.replace('old', 'new')s.str.split(',', expand=True)— split into columnss.str.extract(r'(\d+)')— regex capture groups
Example
import pandas as pd
emails = pd.Series([' Ana@Mail.com ', 'bob@test.org'])
clean = emails.str.strip().str.lower()
domains = emails.str.split('@').str[1]
print(clean, domains, sep='\n')
Nullable string dtype
StringDtype ('string') supports pd.NA missing strings—prefer over object for text columns in new code.
Important interview questions and answers
- Q: Why .str?
A: Dispatches vectorized string ops; NaN propagates safely without errors. - Q: contains regex?
A: Passregex=Trueand use raw strings for pattern matching.
Self-check
- Clean emails to lowercase stripped form.
- Extract domain after @ with str.split.
Tip: Chain .str.strip().str.lower() on messy CSV text columns early in cleaning.
Interview prep
- .str accessor?
Vectorized string ops; NaN propagates safely.
- contains regex?
Pass regex=True for pattern matching in str.contains.