← Back to all products
$19
Data Sync
Record synchronization between JSON, CSV, or SQLite endpoints with content-hash diffing, incremental sync, conflict resolution, dry-run, and bidirectional mode.
JSONMarkdownPython
📄 Product Preview
Try the interactive reader and demo tools below, or get the full product with all content unlocked.
📖 Interactive Reader (Free Preview) ⚙ Try Demo Tools 📦 Download Free Sample📁 File Structure 10 files
data-sync/
├── LICENSE
├── README.md
├── examples/
│ └── data_sync_config.json
├── free-sample.zip
├── guide/
│ ├── 01_features.md
│ ├── 02_quick-start.md
│ ├── 03_output.md
│ └── 04_license.md
├── index.html
└── src/
└── data_sync.py
📖 Documentation Preview README excerpt
Data Sync
A record-synchronization tool that syncs datasets between two endpoints (JSON file, CSV file, or SQLite table). Keys records by a configurable field, computes a content hash per record, and derives a diff (added, changed, deleted, unchanged). Supports incremental sync, conflict resolution, dry-run mode, and bidirectional sync.
Features
- Hash-based diff engine — SHA-256 content hashing detects added, changed, deleted, and unchanged records
- Multi-format endpoints — Read and write JSON files, CSV files, and SQLite tables
- Conflict resolution — Three strategies: source-wins, dest-wins, newest-wins (by timestamp field)
- Incremental sync — Persist state between runs to detect true conflicts (both sides changed)
- Dry-run mode — Preview the sync plan without writing any changes
- Bidirectional sync — Propagate dest-only records back to source instead of deleting them
- Detailed reporting — Diff report with per-record change breakdown and summary statistics
- Config file support — Define sync jobs in JSON for repeatable, scriptable execution
Requirements
- Python 3.10+
- No external dependencies (stdlib only)
Quick Start
# Sync two JSON files by a key field
python src/data_sync.py --source customers_a.json --dest customers_b.json --key customer_id
# Sync CSV to SQLite with conflict strategy
python src/data_sync.py --source orders.csv --dest warehouse.db --key order_id --strategy source-wins
# Preview changes without writing (dry-run)
python src/data_sync.py --source new_data.json --dest master.json --key id --dry-run
# Incremental sync with state tracking
python src/data_sync.py --source export.json --dest mirror.json --key id --state sync_state.json
# Full pipeline from a config file
python src/data_sync.py --config examples/data_sync_config.json
Configuration Reference
Define a sync job in JSON:
{
"source": "customers_export.json",
"dest": "customers_master.json",
"key": "customer_id",
"strategy": "newest-wins",
"timestamp_field": "updated_at",
"state": "sync_state.json",
"dry_run": false,
"bidirectional": false,
"source_table": "customers",
"dest_table": "customers"
}
CLI Flags
... continues with setup instructions, usage examples, and more.
📄 Code Sample .py preview
src/data_sync.py
#!/usr/bin/env python3
"""
Data Sync — Automation Hub (DataNest)
A record-synchronization tool that syncs datasets between two endpoints
(JSON file, CSV file, or SQLite table). Keys records by a configurable
field, computes a content hash per record, and derives a diff (added,
changed, deleted, unchanged). Supports incremental sync via a persisted
state file, conflict-resolution strategies, dry-run mode, and optional
bidirectional sync.
Usage:
python data_sync.py --source customers_a.json --dest customers_b.json --key id
python data_sync.py --source orders.csv --dest orders.db --key order_id --strategy source-wins
python data_sync.py --config examples/data_sync_config.json
python data_sync.py --source a.json --dest b.json --key id --dry-run
Dependencies: Python 3.10+ stdlib only (no pip packages)
License: MIT
"""
from __future__ import annotations
import argparse
import csv
import hashlib
import json
import logging
import sqlite3
import time
from dataclasses import dataclass, field
from datetime import datetime, timezone
from pathlib import Path
from typing import Any
# ---------------------------------------------------------------------------
# Constants
# ---------------------------------------------------------------------------
LOG_FORMAT = "%(asctime)s [%(levelname)s] %(name)s: %(message)s"
HASH_ALGORITHM = "sha256"
logger = logging.getLogger("data_sync")
# ---------------------------------------------------------------------------
# Data models
# ---------------------------------------------------------------------------
@dataclass
# ... 789 more lines ...