← Back to all products
$29
ETL Pipeline
Configurable Extract-Transform-Load pipeline framework for JSON, CSV, and SQLite data sources.
JSONMarkdownPython
📄 Product Preview
Try the interactive reader and demo tools below, or get the full product with all content unlocked.
📖 Interactive Reader (Free Preview) ⚙ Try Demo Tools 📦 Download Free Sample📁 File Structure 10 files
etl-pipeline/
├── LICENSE
├── README.md
├── examples/
│ └── pipeline_config.json
├── free-sample.zip
├── guide/
│ ├── 01_features.md
│ ├── 02_quick-start.md
│ ├── 03_output.md
│ └── 04_license.md
├── index.html
└── src/
└── etl_pipeline.py
📖 Documentation Preview README excerpt
ETL Pipeline
A configurable Extract-Transform-Load pipeline framework built on Python stdlib. Pull data from JSON files, CSV files, SQLite databases, or HTTP APIs. Transform with a composable plugin architecture. Load to JSON, CSV, SQLite, or stdout.
Features
- Multi-source extraction — JSON files, CSV files, SQLite databases, HTTP APIs
- Plugin-based transforms — Chain filters, mappers, renamers, and custom transforms
- Multi-target loading — Write to JSON, CSV, SQLite, or stdout
- Config file support — Define complete pipelines in JSON for repeatable ETL jobs
- Batch processing — Configurable batch sizes for memory-efficient large datasets
- Pipeline stats — Track records extracted, transformed, loaded, and errors
- Nested JSON support — Navigate into nested structures with dot-path notation
- Error resilience — Continue processing on individual record errors
Requirements
- Python 3.10+
- No external dependencies (stdlib only)
Quick Start
# CSV to JSON conversion
python src/etl_pipeline.py --source data.csv --dest output.json
# JSON API to SQLite
python src/etl_pipeline.py --source https://api.example.com/v1/users --dest users.db --table users
# Full pipeline from config
python src/etl_pipeline.py --config examples/pipeline_config.json
# Nested JSON extraction
python src/etl_pipeline.py --source response.json --dest items.csv --records-path "data.items"
Configuration Reference
Create a JSON config for complex ETL jobs:
{
"source": "https://api.example.com/v1/products",
"destination": "products.db",
"table": "products",
"records_path": "data.results",
"transforms": [
{"type": "filter", "field": "status", "value": "active"},
{"type": "rename", "mapping": {"product_name": "name", "product_id": "id"}},
{"type": "map", "field": "price", "operation": "float"},
{"type": "add_field", "field": "imported_at", "value": "$NOW"}
],
"stats_file": "pipeline_stats.json"
}
CLI Options
| Flag | Default | Description |
|---|
... continues with setup instructions, usage examples, and more.
📄 Code Sample .py preview
src/etl_pipeline.py
#!/usr/bin/env python3
"""
ETL Pipeline — Automation Hub (DataNest)
A configurable Extract-Transform-Load pipeline framework built on Python stdlib.
Supports extraction from JSON files, CSV files, SQLite databases, and HTTP APIs.
Transform stage uses a composable plugin architecture. Load stage writes to
JSON, CSV, SQLite, or stdout.
Usage:
python etl_pipeline.py --source data.csv --dest output.json
python etl_pipeline.py --config pipeline_config.json
python etl_pipeline.py --source https://api.example.com/v1/data --dest results.db
Dependencies: Python 3.10+ stdlib only (no pip packages)
License: MIT
"""
from __future__ import annotations
import argparse
import csv
import io
import json
import logging
import os
import re
import sqlite3
import time
import urllib.error
import urllib.request
from abc import ABC, abstractmethod
from dataclasses import dataclass, field
from datetime import datetime, timezone
from pathlib import Path
from typing import Any, Iterator
# ---------------------------------------------------------------------------
# Constants
# ---------------------------------------------------------------------------
LOG_FORMAT = "%(asctime)s [%(levelname)s] %(name)s: %(message)s"
HTTP_TIMEOUT = 30
MAX_HTTP_RETRIES = 3
BATCH_SIZE = 1000
logger = logging.getLogger("etl_pipeline")
# ---------------------------------------------------------------------------
# ... 829 more lines ...