Skip to content

πŸ“… 30-Week Azure Data-Engineering Study Plan

Each week ~15–20 hrs. Click a week to expand details.


Week 1 πŸš€ Data-Warehouse Overview; Inmon vs Kimball; Fact/Grain Choices (20 h)

  • πŸ“– Read IBM’s β€œWhat is a Data Warehouse?” (1 h)
  • πŸ“– Read Inmon’s Building the Data Warehouse Intro & Chapter 1 (2 h)
  • πŸ“– Read Kimball Toolkit Chapters 1–2 (star schemas & conformed dims) (2 h)
  • πŸ“Ή Watch β€œInmon vs Kimball” on YouTube (45 min)
  • πŸ“ Sketch 3 fact-table grains for a sample domain (1 h)
  • πŸ’» Hands-on: design ERD for transactional & snapshot facts (2 h)
  • πŸ“ Build one-page Inmon vs Kimball cheat-sheet (2 h)
  • πŸŽ“ Create Anki flashcards for key terms (1 h)
  • πŸ“ Answer 5 mini β€œdesign a DW” prompts (2 h)
  • πŸ“š Review & record yourself explaining both approaches (2.5 h)

Week 2 πŸ”„ Slowly Changing Dimensions (Types 1–6) (18 h)

  • πŸ“– Read Kimball Toolkit Ch 3 on SCD patterns (2 h)
  • πŸ“– Read blog posts on Types 4 & 6 from Kimball Group (1 h)
  • πŸ’» Hands-on: implement SCD 1/2/3 in SQL on sample data (3 h)
  • πŸ’» Build a Python/Polars pipeline for SCD 2 with history table (3 h)
  • πŸ“ Write an SQL stored proc for Type 2 merge logic (2 h)
  • πŸŽ“ Flashcards: pros/cons of each SCD type (1 h)
  • πŸ“ Scenario Q&A: 10 design prompts (2 h)
  • πŸ“š Review & refine your implementations (4 h)

Week 3 ⚑ Advanced SQL Performance Tuning (18 h)

  • πŸ“– Read SQL Performance Explained Ch 2–4 (3 h)
  • πŸ’» Hands-on: run EXPLAIN on 10 analytical queries (2 h)
  • πŸ’» Add & test clustered/non-clustered/columnstore indexes (3 h)
  • πŸ’» Implement range & hash partitioning; test pruning (2 h)
  • πŸ“– Deep dive into columnstore indexes blog (1 h)
  • πŸ”¨ Mini-Project: optimize a dashboard query on 1 GB dataset (4 h)
  • πŸŽ“ Flashcards: index & partition concepts (1 h)
  • πŸ“ Self-quiz on tuning strategies (2 h)

Week 4 🧩 OLAP vs Relational; Materialized Views & ETL Mapping (16 h)

  • πŸ“– Read articles on OLAP cube architectures (1 h)
  • πŸ’» Create & refresh materialized views in Postgres (2 h)
  • πŸ“ Draft source-to-target mapping doc for OLTPβ†’DW (2 h)
  • πŸ“– Read Kimball ETL mapping templates (1 h)
  • πŸ”¨ Mini-Project: build SSAS cube vs relational report, compare perf (5 h)
  • πŸŽ“ Flashcards: OLAP vs OLTP trade-offs (1 h)
  • πŸ“ Write summary of best practices (2 h)
  • πŸ“š Review & self-test (2 h)

Week 5 🐼 pandas vs Polars – Performance & Memory (16 h)

  • πŸ“– Read pandas & Polars docs on IO & lazy APIs (2 h)
  • πŸ’» Benchmark CSVβ†’Parquet with pandas vs Polars (3 h)
  • πŸ“– Deep dive into Polars lazy mode (1 h)
  • πŸ’» Hands-on: build a sample ETL in both libs; measure memory (3 h)
  • πŸ”¨ Mini-Project: Polars pipeline to clean & write Parquet (4 h)
  • πŸŽ“ Flashcards: key API differences (1 h)
  • πŸ“ Write a short comparison blog snippet (2 h)

Week 6 πŸ› οΈ ETL Design Patterns & Idempotency (16 h)

  • πŸ“– Read articles on config-driven pipelines (1 h)
  • πŸ’» Build YAML/JSON-driven ETL framework (3 h)
  • πŸ’» Implement watermarking & safe retry logic (2 h)
  • πŸ”¨ Mini-Project: generic CSVβ†’DB loader with idempotency (5 h)
  • πŸŽ“ Flashcards: design pattern names & use-cases (1 h)
  • πŸ“ Self-review & refine code (4 h)

Week 7 βš™οΈ Testing & Logging in Python (16 h)

  • πŸ“– Read pytest docs on fixtures & parametrization (1 h)
  • πŸ’» Write unit tests for ETL transforms (3 h)
  • πŸ“– Read Python logging cookbook (1 h)
  • πŸ’» Implement structured JSON logging & retries (2 h)
  • πŸ”¨ Mini-Project: add tests & logs to your Week 6 pipeline (6 h)
  • πŸŽ“ Review test coverage & log outputs (2 h)
  • πŸ“ Quiz yourself on pytest & logging concepts (1 h)

Week 8 🚦 CI Basics with GitHub Actions (14 h)

  • πŸ“– Read GH Actions Python CI guide (1 h)
  • πŸ’» Create .github/workflows/ci.yml to run pytest & flake8 (3 h)
  • πŸ“– Read Poetry packaging docs (1 h)
  • πŸ’» Configure Poetry & lock file for your project (2 h)
  • πŸ”¨ Mini-Project: integrate CI into your Week 7 repo (5 h)
  • πŸŽ“ Review CI logs & fix failures (2 h)

Weeks 9–12 πŸ”₯ Spark & Delta Performance Module (~18 h/wk)

  • Week 9 – Spark internals (DAG, stages, executors): read Spark: The Definitive Guide Ch 1–2, watch internals video, hands-on DAG inspection, mini-project Spark job (18 h)
  • Week 10 – Joins & shuffles: read docs on broadcast vs sort-merge, fix skewed joins, project on sample dataset (18 h)
  • Week 11 – AQE & caching: read official blog, enable AQE, benchmark with/without cache, mini-project (18 h)
  • Week 12 – Delta Lake deep dive: read Delta Lake guide, implement MERGE/Z-ordering, time-travel queries, project (18 h)

Weeks 13–16 🌊 Streaming & Data Quality Module (~17 h/wk)

  • Week 13 – Lambda vs Kappa & Event Hubs basics: articles & hands-on ingestion (17 h)
  • Week 14 – Structured Streaming APIs: triggers, output modes, checkpoints, code labs (17 h)
  • Week 15 – Stateful processing: window ops, watermark cleanup, demos (17 h)
  • Week 16 – Data quality frameworks: Great Expectations suites & dbt tests, quality dashboard (17 h)

Weeks 17–20 🏞️ Azure Lakehouse Module (~16 h/wk)

  • Week 17 – ADLS Gen2 setup & security (RBAC, ACLs, firewall) (16 h)
  • Week 18 – Databricks workspace, clusters & notebooks (16 h)
  • Week 19 – Unity Catalog governance & lineage (16 h)
  • Week 20 – Medallion pattern: implement Bronze/Silver/Gold pipeline (16 h)

Weeks 21–24 πŸ”§ ADF & Synapse Module (~16 h/wk)

  • Week 21 – ADF pipelines: linked services, datasets, triggers (16 h)
  • Week 22 – Mapping Data Flows: transformations & expressions (16 h)
  • Week 23 – Synapse SQL pools: serverless vs dedicated tuning (16 h)
  • Week 24 – CI/CD: ARM templates & Git integration for ADF/Synapse (16 h)

Weeks 25–28 🌐 Fabric Lakehouse & Real-Time Module (~16 h/wk)

  • Week 25 – Fabric architecture: OneLake & shortcuts (16 h)
  • Week 26 – Fabric Data Factory pipelines & notebooks (16 h)
  • Week 27 – DirectLake in Power BI: live query patterns (16 h)
  • Week 28 – Governance & lifecycle: roles, promotion pipelines (16 h)

Weeks 29–30 🎯 Interview & System Design Module (~15 h/wk)

  • Week 29 – RΓ©sumΓ© & LinkedIn optimization; project storytelling (15 h)
  • Week 30 – Mock interviews: STAR, technical Q&A & system-design drills (15 h)