← Back to all products

Data Sync

$19

Record synchronization between JSON, CSV, or SQLite endpoints with content-hash diffing, incremental sync, conflict resolution, dry-run, and bidirectional mode.

📁 10 files
JSONMarkdownPython

📄 Product Preview

Try the interactive reader and demo tools below, or get the full product with all content unlocked.

📖 Interactive Reader (Free Preview) ⚙ Try Demo Tools 📦 Download Free Sample

📁 File Structure 10 files

data-sync/ ├── LICENSE ├── README.md ├── examples/ │ └── data_sync_config.json ├── free-sample.zip ├── guide/ │ ├── 01_features.md │ ├── 02_quick-start.md │ ├── 03_output.md │ └── 04_license.md ├── index.html └── src/ └── data_sync.py

📖 Documentation Preview README excerpt

Data Sync

A record-synchronization tool that syncs datasets between two endpoints (JSON file, CSV file, or SQLite table). Keys records by a configurable field, computes a content hash per record, and derives a diff (added, changed, deleted, unchanged). Supports incremental sync, conflict resolution, dry-run mode, and bidirectional sync.

Features

  • Hash-based diff engine — SHA-256 content hashing detects added, changed, deleted, and unchanged records
  • Multi-format endpoints — Read and write JSON files, CSV files, and SQLite tables
  • Conflict resolution — Three strategies: source-wins, dest-wins, newest-wins (by timestamp field)
  • Incremental sync — Persist state between runs to detect true conflicts (both sides changed)
  • Dry-run mode — Preview the sync plan without writing any changes
  • Bidirectional sync — Propagate dest-only records back to source instead of deleting them
  • Detailed reporting — Diff report with per-record change breakdown and summary statistics
  • Config file support — Define sync jobs in JSON for repeatable, scriptable execution

Requirements

  • Python 3.10+
  • No external dependencies (stdlib only)

Quick Start


# Sync two JSON files by a key field
python src/data_sync.py --source customers_a.json --dest customers_b.json --key customer_id

# Sync CSV to SQLite with conflict strategy
python src/data_sync.py --source orders.csv --dest warehouse.db --key order_id --strategy source-wins

# Preview changes without writing (dry-run)
python src/data_sync.py --source new_data.json --dest master.json --key id --dry-run

# Incremental sync with state tracking
python src/data_sync.py --source export.json --dest mirror.json --key id --state sync_state.json

# Full pipeline from a config file
python src/data_sync.py --config examples/data_sync_config.json

Configuration Reference

Define a sync job in JSON:


{
    "source": "customers_export.json",
    "dest": "customers_master.json",
    "key": "customer_id",
    "strategy": "newest-wins",
    "timestamp_field": "updated_at",
    "state": "sync_state.json",
    "dry_run": false,
    "bidirectional": false,
    "source_table": "customers",
    "dest_table": "customers"
}

CLI Flags

... continues with setup instructions, usage examples, and more.

📄 Code Sample .py preview

src/data_sync.py #!/usr/bin/env python3 """ Data Sync — Automation Hub (DataNest) A record-synchronization tool that syncs datasets between two endpoints (JSON file, CSV file, or SQLite table). Keys records by a configurable field, computes a content hash per record, and derives a diff (added, changed, deleted, unchanged). Supports incremental sync via a persisted state file, conflict-resolution strategies, dry-run mode, and optional bidirectional sync. Usage: python data_sync.py --source customers_a.json --dest customers_b.json --key id python data_sync.py --source orders.csv --dest orders.db --key order_id --strategy source-wins python data_sync.py --config examples/data_sync_config.json python data_sync.py --source a.json --dest b.json --key id --dry-run Dependencies: Python 3.10+ stdlib only (no pip packages) License: MIT """ from __future__ import annotations import argparse import csv import hashlib import json import logging import sqlite3 import time from dataclasses import dataclass, field from datetime import datetime, timezone from pathlib import Path from typing import Any # --------------------------------------------------------------------------- # Constants # --------------------------------------------------------------------------- LOG_FORMAT = "%(asctime)s [%(levelname)s] %(name)s: %(message)s" HASH_ALGORITHM = "sha256" logger = logging.getLogger("data_sync") # --------------------------------------------------------------------------- # Data models # --------------------------------------------------------------------------- @dataclass # ... 789 more lines ...
Buy Now — $19 Back to Products