← Back to all products
€29
Medallion Architecture Guide
A comprehensive decision framework and implementation guide for building production-grade medallion architectures in Databricks. 10 guide chapters, 6 runnable code examples, reference architectures, anti-pattern encyclopedia, and cheatsheets.
DatabricksPySparkDelta LakeMarkdown
📁 File Structure 22 files
medallion-architecture-guide/
├── README.md
├── LICENSE
│
├── guide/
│ ├── 01_introduction.md
│ ├── 02_decision_framework.md
│ ├── 03_bronze_layer.md
│ ├── 04_silver_layer.md
│ ├── 05_gold_layer.md
│ ├── 06_naming_conventions.md
│ ├── 07_schema_evolution.md
│ ├── 08_data_quality_gates.md
│ ├── 09_anti_patterns.md
│ └── 10_reference_architectures.md
│
├── code_examples/
│ ├── bronze_ingestion.py
│ ├── silver_transformation.py
│ ├── gold_aggregation.py
│ ├── cross_layer_pipeline.py
│ ├── naming_convention_generator.py
│ └── schema_migration.py
│
├── diagrams/
│ ├── medallion_overview.md
│ ├── data_flow.md
│ └── decision_tree.md
│
└── cheatsheets/
├── layer_comparison.md
├── naming_conventions_cheatsheet.md
└── migration_checklist.md
📖 Documentation Preview README excerpt
Why This Guide Exists
Every team that adopts Databricks eventually faces the same question: "How should we organize our lakehouse?" Most teams implement medallion poorly — this guide gives you the decision framework to know when it's right and the implementation playbook to do it properly.
What You Get
- Decision Framework — Genuine decision tree: medallion vs. data vault vs. one-big-table vs. streaming-first
- Deep-Dive Layer Guides — Bronze (Auto Loader, CDC), Silver (SCD, dedup, DQ gates), Gold (aggregates, feature stores)
- Production-Ready Code — Complete, runnable PySpark scripts with type hints and error handling
- Anti-Pattern Encyclopedia — 15+ named anti-patterns with before/after code fixes
- Reference Architectures — E-Commerce, IoT, Financial Services, SaaS Analytics, Healthcare
Content Depth
~5,000+ lines across all files: ~3,000 lines of technical writing, ~1,500 lines of production PySpark code, ~500 lines of reference material. Equivalent to a 150-page technical book.
📄 Code Sample .py preview
code_examples/bronze_ingestion.py
"""Bronze Layer Ingestion Pipeline.
Complete bronze layer ingestion example supporting both
Auto Loader (streaming) and batch JDBC ingestion patterns.
Designed for Databricks Runtime 13.3+.
Usage:
from bronze_ingestion import BronzeIngestionPipeline
pipeline = BronzeIngestionPipeline(spark, config)
pipeline.ingest_files(
"json",
"/landing/orders/",
"bronze.erp.raw_orders",
)
"""
from __future__ import annotations
from dataclasses import dataclass, field
from datetime import datetime
from typing import Any
from pyspark.sql import DataFrame, SparkSession
from pyspark.sql import functions as F
from pyspark.sql.streaming import StreamingQuery
from pyspark.sql.types import (
StringType, StructField, StructType, TimestampType,
)
@dataclass
class IngestionConfig:
"""Configuration for a bronze ingestion pipeline."""
checkpoint_base: str = "/checkpoints"
source_system: str = "unknown"
... remaining implementation in full product