Introduction to Data Science

Last reviewed May 28, 2026 Content v20260528

Track mode

server_script

Means

Server runner

Reading

~3 min

Level

beginner

This lesson

An orientation to the Data Science track—workflow, ethics, Python playground practice, and links to NumPy/Pandas next.

You need a clear map of the Data Science lifecycle so exploration, leakage, and stakeholder communication do not feel like ad hoc guessing.

You will apply Introduction to Data Science in contexts like: Analytics teams, product experimentation, research labs, and ML-adjacent engineering in every data-driven company.

Read the narrative, run Python in the playground (stdlib snippets now; install Jupyter, pandas, and scikit-learn locally for full notebooks), and complete MCQs to lock in vocabulary. Also read the interview prep blocks; write one measurable question for a dataset you care about.

After /python/intro basics and ideally some /sql/intro—before deep NumPy/Pandas specialization.

How this Data Science track works

Python playground — lessons use execution_profile: server_script. Run snippets in the playground; heavier stacks (Jupyter, pandas, scikit-learn) install locally with pip.
Workflow-first — questions, data quality, exploration, cleaning, modeling concepts, ethics, and communication—before deep dives on NumPy, Pandas, and SciPy.
Prerequisites — finish Python basics and skim SQL for warehouse queries. Statistics intuition helps but is taught from scratch here.
Pair with — AI and Generative AI for product context after you understand the data science lifecycle.

Playground code uses Python stdlib where possible. Install pandas, jupyter, and matplotlib locally for full notebook workflows.

Install on your device (macOS, Linux, Windows)

Install Python 3.11+ locally for notebooks and frameworks; the on-site playground uses the dev runner when enabled.

macOS

brew install python@3.12 or install from python.org (check “Add to PATH” on installers).
Create a project folder: mkdir ~/python-practice && cd ~/python-practice.
python3 -m venv .venv && source .venv/bin/activate
pip install --upgrade pip

Linux

Debian/Ubuntu: sudo apt update && sudo apt install -y python3 python3-pip python3-venv
Fedora: sudo dnf install -y python3 python3-pip
python3 -m venv .venv && source .venv/bin/activate
pip install --upgrade pip

Windows

Install from python.org and enable Add python.exe to PATH.
Or: winget install Python.Python.3.12
PowerShell: py -3 -m venv .venv; .\.venv\Scripts\Activate.ps1
pip install --upgrade pip

Verify: python3 --version (or py --version on Windows) shows 3.11+.

Run code on this site (Backend & language playgrounds)

Clone or open this project locally; copy .env.example to .env.
Ensure LEARNING_RUNNER_ENABLED=true and LEARNING_RUNNER_URL=http://127.0.0.1:9999/v1/execute.
Terminal 1: php artisan serve (or composer run dev for Laravel + Vite + runner together).
Terminal 2: npm run runner — keep it running while you click Run on server.

Starter stack: pip install jupyter pandas matplotlib seaborn scikit-learn

Data science turns raw data into decisions: ask questions, collect data, clean and explore, model uncertainty, and communicate findings. This track teaches the workflow and thinking—using Python in the playground and pointing to NumPy, Pandas, and local Jupyter for deeper tooling.

Prerequisites and how this track works

Finish Python basics (variables, functions, lists, dicts). Skim SQL if you will query warehouses. Lessons run Python with execution_profile: server_script; install jupyter and pandas locally for notebook workflows.

What you will learn

Framing business questions and data types
Exploration, quality, missing data, and outliers
Cleaning, train/test splits, and modeling concepts
Metrics, cross-validation, bias, and ethics
Visualization, storytelling, reproducibility, SQL in pipelines

First run

print("Data Science track")
print("Next: NumPy, Pandas, and local Jupyter for full stacks")

Data science vs software engineering

Engineers ship features; data scientists validate hypotheses with evidence. Both write Python—but DS emphasizes distributions, leakage, and stakeholder communication.

Important interview questions and answers

Q: Is this the same as ML?
A: Data science includes exploration and communication; ML engineering focuses on training and serving models at scale.
Q: Why Python?
A: Readable syntax and PyPI ecosystem—pandas, scikit-learn, and notebooks are industry defaults.

Self-check

What two tracks should you complete first?
What execution profile does this topic use?

Challenge

First Python run in Data Science track

Click Run with the default code.
Confirm output appears in the terminal.
Add a line printing one data science question you care about.

Done when: the terminal shows the default message and your custom question.

Tip: Run the playground challenge before moving on—later lessons assume Python runs.

Interview prep

Prerequisite?: Python basics (/python/intro) and SQL literacy (/sql/intro) help.
Playground?: server_script runs Python; install Jupyter/pandas locally for full stacks.

Playground

Runs on the configured server runner (dev: npm run runner with LEARNING_RUNNER_ENABLED=true). Output appears below the editor.

Code runner not available

Server runner is disabled. Set LEARNING_RUNNER_ENABLED=true and LEARNING_RUNNER_URL in .env (see .env.example).

Discussion

Past discussion is visible to everyone. Only logged-in users can post comments and replies.

Starter discussion topics

Why DS after Python?
Jupyter local setup?

No discussion yet. Be the first to ask a question.