← Back to all products

ETL Pipeline

$29

Configurable Extract-Transform-Load pipeline framework for JSON, CSV, and SQLite data sources.

📁 10 files
JSONMarkdownPython

📄 Product Preview

Try the interactive reader and demo tools below, or get the full product with all content unlocked.

📖 Interactive Reader (Free Preview) ⚙ Try Demo Tools 📦 Download Free Sample

📁 File Structure 10 files

etl-pipeline/ ├── LICENSE ├── README.md ├── examples/ │ └── pipeline_config.json ├── free-sample.zip ├── guide/ │ ├── 01_features.md │ ├── 02_quick-start.md │ ├── 03_output.md │ └── 04_license.md ├── index.html └── src/ └── etl_pipeline.py

📖 Documentation Preview README excerpt

ETL Pipeline

A configurable Extract-Transform-Load pipeline framework built on Python stdlib. Pull data from JSON files, CSV files, SQLite databases, or HTTP APIs. Transform with a composable plugin architecture. Load to JSON, CSV, SQLite, or stdout.

Features

  • Multi-source extraction — JSON files, CSV files, SQLite databases, HTTP APIs
  • Plugin-based transforms — Chain filters, mappers, renamers, and custom transforms
  • Multi-target loading — Write to JSON, CSV, SQLite, or stdout
  • Config file support — Define complete pipelines in JSON for repeatable ETL jobs
  • Batch processing — Configurable batch sizes for memory-efficient large datasets
  • Pipeline stats — Track records extracted, transformed, loaded, and errors
  • Nested JSON support — Navigate into nested structures with dot-path notation
  • Error resilience — Continue processing on individual record errors

Requirements

  • Python 3.10+
  • No external dependencies (stdlib only)

Quick Start


# CSV to JSON conversion
python src/etl_pipeline.py --source data.csv --dest output.json

# JSON API to SQLite
python src/etl_pipeline.py --source https://api.example.com/v1/users --dest users.db --table users

# Full pipeline from config
python src/etl_pipeline.py --config examples/pipeline_config.json

# Nested JSON extraction
python src/etl_pipeline.py --source response.json --dest items.csv --records-path "data.items"

Configuration Reference

Create a JSON config for complex ETL jobs:


{
    "source": "https://api.example.com/v1/products",
    "destination": "products.db",
    "table": "products",
    "records_path": "data.results",
    "transforms": [
        {"type": "filter", "field": "status", "value": "active"},
        {"type": "rename", "mapping": {"product_name": "name", "product_id": "id"}},
        {"type": "map", "field": "price", "operation": "float"},
        {"type": "add_field", "field": "imported_at", "value": "$NOW"}
    ],
    "stats_file": "pipeline_stats.json"
}

CLI Options

FlagDefaultDescription

... continues with setup instructions, usage examples, and more.

📄 Code Sample .py preview

src/etl_pipeline.py #!/usr/bin/env python3 """ ETL Pipeline — Automation Hub (DataNest) A configurable Extract-Transform-Load pipeline framework built on Python stdlib. Supports extraction from JSON files, CSV files, SQLite databases, and HTTP APIs. Transform stage uses a composable plugin architecture. Load stage writes to JSON, CSV, SQLite, or stdout. Usage: python etl_pipeline.py --source data.csv --dest output.json python etl_pipeline.py --config pipeline_config.json python etl_pipeline.py --source https://api.example.com/v1/data --dest results.db Dependencies: Python 3.10+ stdlib only (no pip packages) License: MIT """ from __future__ import annotations import argparse import csv import io import json import logging import os import re import sqlite3 import time import urllib.error import urllib.request from abc import ABC, abstractmethod from dataclasses import dataclass, field from datetime import datetime, timezone from pathlib import Path from typing import Any, Iterator # --------------------------------------------------------------------------- # Constants # --------------------------------------------------------------------------- LOG_FORMAT = "%(asctime)s [%(levelname)s] %(name)s: %(message)s" HTTP_TIMEOUT = 30 MAX_HTTP_RETRIES = 3 BATCH_SIZE = 1000 logger = logging.getLogger("etl_pipeline") # --------------------------------------------------------------------------- # ... 829 more lines ...
Buy Now — $29 Back to Products