Module-1 β DW & SQL Foundations
Module-1 β DW & SQL Foundations
π― Objectives
- Grasp dimensional-modeling patterns (Kimball vs Inmon).
- Master advanced Slowly Changing Dimensions (Types 1β6).
- Optimize SQL queries for analytics.
- Design a small data-warehouse schema end-to-end.
ποΈ Weekly Plan
- Week 1 β Data-warehouse overview; Inmon vs Kimball; fact/grain choices.
- Week 2 β Deep dive into SCD Types 1, 2, 3, 4, 6 (hybrids).
- Week 3 β SQL performance tuning: explain plans, indexing, partitioning.
- Week 4 β OLAP vs relational, materialized views, ETL mapping.
πKey Concepts
1. Schema Patterns
- Star schema: single fact table + conformed dimensions
- Snowflake schema: normalized dimension tables
- Galaxy schema: multiple fact tables sharing dims
2. Dimension Table Types
- Conformed dimensions: reused across facts
- Junk dimensions: grouping low-cardinality flags
- Degenerate dimensions: dimension stored in fact (no separate table)
3. Slowly Changing Dimensions (SCD)
- Type 1: overwrite old value
- Type 2: add new row + versioning columns (
effective_date
, end_date
)
- Type 3: add new attribute column (e.g.
prev_address
)
- Type 4: history table separate from current table
- Type 6 (Hybrid): combine Types 1/2/3 for specific use cases
4. Fact Table Granularity
- Transaction facts: one row per event
- Periodic snapshot facts: one row per period summary
- Accumulating snapshot facts: track process lifecycle
5. Keys & Indexing
- Surrogate vs natural keys: hidden identity vs business key
- Clustered vs non-clustered indexes
- Columnstore indexes: for high-performance analytics
6. Partitioning Strategies
- Range partitioning: by date or numeric range
- List/hHash partitioning: by category or hash function
- Benefits: improved prune, maintenance, parallelism
7. Materialized Views
- Pre-computed aggregations
- Refresh modes: on-demand vs incremental
- Use-cases: speeding up complex joins/aggregates
π¨ Mini-Projects
- Retail DW Prototype: Model & implement a 3-star schema in Postgres (DDL + ERD).
- SCD Pipeline: Load CSV to Type-2 dimension with history tracking.
- SQL Tuning Lab: Optimize 10 complex analytical queries on a 1 GB dataset.
π Resources
- Ralph Kimballβs Dimensional Modeling Techniques
- SQL Performance Explained by Markus Winand
- PostgreSQL docs: partitioning & indexing