Module-1 – DW & SQL Foundations

Module-1 – DW & SQL Foundations

🎯 Objectives

  • Grasp dimensional-modeling patterns (Kimball vs Inmon).
  • Master advanced Slowly Changing Dimensions (Types 1–6).
  • Optimize SQL queries for analytics.
  • Design a small data-warehouse schema end-to-end.

πŸ—“οΈ Weekly Plan

  • Week 1 – Data-warehouse overview; Inmon vs Kimball; fact/grain choices.
  • Week 2 – Deep dive into SCD Types 1, 2, 3, 4, 6 (hybrids).
  • Week 3 – SQL performance tuning: explain plans, indexing, partitioning.
  • Week 4 – OLAP vs relational, materialized views, ETL mapping.

πŸ”‘Key Concepts

1. Schema Patterns

  • Star schema: single fact table + conformed dimensions
  • Snowflake schema: normalized dimension tables
  • Galaxy schema: multiple fact tables sharing dims

2. Dimension Table Types

  • Conformed dimensions: reused across facts
  • Junk dimensions: grouping low-cardinality flags
  • Degenerate dimensions: dimension stored in fact (no separate table)

3. Slowly Changing Dimensions (SCD)

  • Type 1: overwrite old value
  • Type 2: add new row + versioning columns (effective_date, end_date)
  • Type 3: add new attribute column (e.g. prev_address)
  • Type 4: history table separate from current table
  • Type 6 (Hybrid): combine Types 1/2/3 for specific use cases

4. Fact Table Granularity

  • Transaction facts: one row per event
  • Periodic snapshot facts: one row per period summary
  • Accumulating snapshot facts: track process lifecycle

5. Keys & Indexing

  • Surrogate vs natural keys: hidden identity vs business key
  • Clustered vs non-clustered indexes
  • Columnstore indexes: for high-performance analytics

6. Partitioning Strategies

  • Range partitioning: by date or numeric range
  • List/hHash partitioning: by category or hash function
  • Benefits: improved prune, maintenance, parallelism

7. Materialized Views

  • Pre-computed aggregations
  • Refresh modes: on-demand vs incremental
  • Use-cases: speeding up complex joins/aggregates

πŸ”¨ Mini-Projects

  • Retail DW Prototype: Model & implement a 3-star schema in Postgres (DDL + ERD).
  • SCD Pipeline: Load CSV to Type-2 dimension with history tracking.
  • SQL Tuning Lab: Optimize 10 complex analytical queries on a 1 GB dataset.

πŸ“š Resources

  • Ralph Kimball’s Dimensional Modeling Techniques
  • SQL Performance Explained by Markus Winand
  • PostgreSQL docs: partitioning & indexing