Modern NumPy uses np.random.default_rng(seed) to generate reproducible pseudo-random samples for simulation, train/test splits, and data augmentation.
Generator API
import numpy as np
rng = np.random.default_rng(42)
ints = rng.integers(0, 10, size=5)
norm = rng.normal(0, 1, size=5)
print(ints, norm)
Common distributions
integers(low, high, size)— discrete uniformnormal(loc, scale, size)— Gaussianuniform(low, high, size)— continuous uniformchoice(a, size, replace)— sample from arrayshuffle— in-place permutation
Reproducibility
Same seed → same sequence. Set seed once per experiment; document it in notebooks per data science reproducibility habits.
Important interview questions and answers
- Q: Why not legacy np.random.rand?
A: Generator API has better statistical properties and clearer interface. - Q: choice with replace=False?
A: Samples without replacement—like a shuffle pick.
Self-check
- Generate 5 standard normal samples with seed 0.
- What method samples integers in a range?
Tip: Use default_rng(seed) and document seeds in notebooks.
Interview prep
- default_rng?
Modern Generator API—preferred over legacy global random state.
- Seed purpose?
Same seed → same sequence for reproducible tests.