← Back to all products

Streaming Pipeline Kit

$49

Kafka consumer/producer templates, Spark Structured Streaming jobs, exactly-once processing patterns, and dead letter queues.

📁 15 files🏷 v1.0.0
Production-ready
✓ Instant download✓ Lifetime updates✓ MIT licensed✓ MIT license✓ Secure checkout (Stripe)

👥 Who This Is For

✓ Best for

Production-ready Spark Structured Streaming templates with Kafka integration, exactly-once processing, Delta Lake sinks, and real-time monitoring for Databricks.

✗ Not for

Absolute beginners unfamiliar with the technology, or teams looking for a fully hosted SaaS product. This is a template/guide, not a managed service.

🎨 Architecture Overview diagram

PIPELINE · data flow Notebooks Kafka Consumer Streaming Pipeline… Notebooks · Kafka… Event… Delta Lake… Notebooks · Kafka… · Event… · Delta…

Representative architecture — see full README for details.

📋 What's Inside 15 files

  • README.md
  • manifest.json
  • LICENSE
  • src/kafka_consumer.py
  • src/event_processor.py
  • src/stream_to_delta.py
  • src/stream_monitor.py
  • src/schema_registry.py
  • configs/streaming_config.yaml
  • configs/schemas/user_events.avsc
  • configs/schemas/order_events.avsc
  • notebooks/start_stream.py
  • notebooks/monitor_streams.py
  • tests/test_event_processor.py
  • guides/streaming-patterns.md

📁 File Structure 15 files

streaming-pipeline-kit/
├── README.md
├── manifest.json
├── LICENSE
├── src/
│ ├── kafka_consumer.py
│ ├── event_processor.py
│ ├── stream_to_delta.py
│ ├── stream_monitor.py
│ ├── schema_registry.py
├── configs/
│ ├── streaming_config.yaml
│ ├── schemas/
│ │ ├── user_events.avsc
│ │ ├── order_events.avsc
├── notebooks/
│ ├── start_stream.py
│ ├── monitor_streams.py
├── tests/
│ ├── test_event_processor.py
├── guides/
│ ├── streaming-patterns.md

📖 Documentation Preview README excerpt

Streaming Pipeline Kit

Real-time data pipelines that just work. Production-ready Spark Structured Streaming

templates with Kafka integration, exactly-once semantics, and Delta Lake sinks.

By [Datanest Digital](https://datanest.dev) | Version 1.0.0 | $49

---

What You Get

  • **Kafka Consumer** — Structured Streaming reader with schema registry integration,
  • watermarking, and configurable checkpointing

  • **Event Processor** — Deduplication by event ID, late-arrival handling, and windowed
  • aggregations with customizable window sizes

  • **Delta Lake Writer** — `foreachBatch` sink with merge/append modes, schema evolution,
  • and inline data quality checks

  • **Stream Monitor** — Query progress listener, consumer lag tracking, dead letter queue
  • routing, and alerting hooks

  • **Schema Registry Client** — Confluent-compatible client for fetching, registering,
  • and validating Avro schemas with compatibility checks

  • **Databricks Notebooks** — Ready-to-run notebooks for starting streams and monitoring
  • active queries in real time

  • **Avro Schemas** — Example schemas for user events and order events
  • **Streaming Patterns Guide** — Best practices for watermarks, triggers, state management,
  • and failure recovery

    File Tree

    
    

    streaming-pipeline-kit/

    ├── README.md

    ├── manifest.json

    ├── LICENSE

    ├── src/

    │ ├── kafka_consumer.py # Kafka source with schema registry

    │ ├── event_processor.py # Dedup, late arrivals, windowed aggs

    │ ├── stream_to_delta.py # foreachBatch Delta Lake writer

    │ ├── stream_monitor.py # Progress listener & lag monitoring

    │ └── schema_registry.py # Schema Registry client

    ├── configs/

    │ ├── streaming_config.yaml # Kafka, checkpoint, trigger settings

    │ └── schemas/

    │ ├── user_events.avsc # User event Avro schema

    │ └── order_events.avsc # Order event Avro schema

    ├── notebooks/

    │ ├── start_stream.py # Launch streaming pipeline

    │ └── monitor_streams.py # Real-time monitoring dashboard

    ├── tests/

    │ └── test_event_processor.py # Unit tests for event processing

    └── guides/

    └── streaming-patterns.md # Patterns & best practices

    
    
    

    Getting Started

    1. Configure your environment

    Edit configs/streaming_config.yaml with your Kafka bootstrap servers,

    schema registry URL, and checkpoint locations:


    ... preview truncated, see full README in product download.

    📄 Code Sample .py preview

    src/kafka_consumer.py""" Kafka Consumer — Spark Structured Streaming reader with Schema Registry integration. Provides a configurable Kafka source that handles: - JSON and Avro deserialization with schema registry lookup - Watermarking for late-arriving events - Configurable starting offsets (latest, earliest, specific) - SSL/SASL authentication for secured clusters - Automatic checkpoint management Designed for Databricks Runtime — uses global `spark` session. By Datanest Digital | https://datanest.dev """ from __future__ import annotations from dataclasses import dataclass, field from enum import Enum from typing import Any, Optional import yaml from pyspark.sql import DataFrame, SparkSession from pyspark.sql import functions as F from pyspark.sql.types import ( StringType, StructField, StructType, TimestampType, ) from src.schema_registry import SchemaRegistryClient class DeserializationFormat(Enum): """Supported message deserialization formats.""" JSON = "json" AVRO = "avro" STRING = "string"

    📅 Changelog

    v1.0.0 — Initial release

    Purchases include lifetime updates. Check the product page for the latest version.

    📄 Product Preview

    Full file listing available after purchase. Check the store page for details.

    ❓ Frequently Asked Questions

    What license is this under?

    How do I download after purchase?

    Do I get updates?

    What if it doesn't work for me?

    Can I get a refund?

    Is there support?

    Buy Now — $49 Back to Products