Contents

Chapter 1

Features

This chapter covers the core features and capabilities of ETL Pipeline.

Features

  • Multi-source extraction — JSON files, CSV files, SQLite databases, HTTP APIs
  • Plugin-based transforms — Chain filters, mappers, renamers, and custom transforms
  • Multi-target loading — Write to JSON, CSV, SQLite, or stdout
  • Config file support — Define complete pipelines in JSON for repeatable ETL jobs
  • Batch processing — Configurable batch sizes for memory-efficient large datasets
  • Pipeline stats — Track records extracted, transformed, loaded, and errors
  • Nested JSON support — Navigate into nested structures with dot-path notation
  • Error resilience — Continue processing on individual record errors

Requirements

  • Python 3.10+
  • No external dependencies (stdlib only)
Chapter 2

Quick Start

Follow this guide to get ETL Pipeline up and running in your environment.

Quick Start

bash
# CSV to JSON conversion
python src/etl_pipeline.py --source data.csv --dest output.json

# JSON API to SQLite
python src/etl_pipeline.py --source https://api.example.com/v1/users --dest users.db --table users

# Full pipeline from config
python src/etl_pipeline.py --config examples/pipeline_config.json

# Nested JSON extraction
python src/etl_pipeline.py --source response.json --dest items.csv --records-path "data.items"

Configuration Reference

Create a JSON config for complex ETL jobs:

json
{
    "source": "https://api.example.com/v1/products",
    "destination": "products.db",
    "table": "products",
    "records_path": "data.results",
    "transforms": [
        {"type": "filter", "field": "status", "value": "active"},
        {"type": "rename", "mapping": {"product_name": "name", "product_id": "id"}},
        {"type": "map", "field": "price", "operation": "float"},
        {"type": "add_field", "field": "imported_at", "value": "$NOW"}
    ],
    "stats_file": "pipeline_stats.json"
}

CLI Options

FlagDefaultDescription
--source, -s—Source file path or URL
--dest, -d—Destination file path
--config, -c—Pipeline config file (JSON)
--table, -t—SQLite table name
--records-path—Dot-path to records array in JSON (e.g. data.items)
--stats-file—Write pipeline stats to this JSON file
--log-levelINFOLogging level (DEBUG, INFO, WARNING, ERROR)

Transform Types

TypeDescriptionConfig Fields
filterKeep records matching a conditionfield, value, operator
renameRename fieldsmapping (dict)
mapTransform a field valuefield, operation
add_fieldAdd a new fieldfield, value ($NOW for timestamp)
remove_fieldsDrop fieldsfields (list)
Chapter 3
🔒 Available in full product

Output

Chapter 4
🔒 Available in full product

License

You’ve reached the end of the free preview

Get the full ETL Pipeline and unlock everything.

All Chapters

Get the complete guide with every chapter unlocked, including code samples, diagrams, and best practices.

Full Tool Suite

Access all interactive tools with complete data, all workload profiles, and the full scenario library.

Source Files

Downloadable source code, configuration files, and working examples from every chapter.

Lifetime Updates

Free updates for life. Every new chapter, tool, and improvement included.

Buy Now — $29 →
📦 Free sample included — download another copy for the full product.
ETL Pipeline v1.0.0 — Free Preview