← Back to all products
$19
Canonical URL Planner
Canonical URL planning and audit tool for preventing duplicate content issues.
JSONMarkdownPythonCI/CD
📄 Product Preview
Try the interactive reader and demo tools below, or get the full product with all content unlocked.
📖 Interactive Reader (Free Preview) ⚙ Try Demo Tools 📦 Download Free Sample📁 File Structure 10 files
canonical-url-planner/
├── LICENSE
├── README.md
├── examples/
│ ├── pages.csv
│ └── pages.json
├── free-sample.zip
├── guide/
│ ├── 01_features.md
│ ├── 02_input-formats.md
│ └── 03_license.md
├── index.html
└── src/
└── canonical_url_planner.py
📖 Documentation Preview README excerpt
Canonical URL Planner
Part of the SEO Toolkit by CodeVault
Plan and validate canonical URLs across your entire site. Detect chains, loops, duplicate content clusters, missing canonicals, protocol mismatches, and cross-domain references — all from a single CSV or JSON audit.
Features
- Load page/canonical pairs from CSV or JSON
- Detect canonical chains (A → B → C) and recommend direct canonicals
- Detect canonical loops (A → B → A)
- Find missing canonical tags
- Flag protocol mismatches (HTTP page with HTTPS canonical)
- Identify cross-domain canonicals
- Warn about query parameters in canonical URLs
- Group pages by their canonical target to visualize duplicate content clusters
- Track self-referencing canonicals (usually correct)
- Multiple output formats: text, CSV, or JSON
- Strict mode for CI/CD pipelines
- Python stdlib only — zero dependencies
Quick Start
# Validate canonical URLs from CSV
python src/canonical_url_planner.py --input examples/pages.csv
# Validate from JSON
python src/canonical_url_planner.py --input examples/pages.json
# JSON output for CI/CD
python src/canonical_url_planner.py --input examples/pages.csv --format json
# Strict mode
python src/canonical_url_planner.py --input examples/pages.csv --strict
# Write report to file
python src/canonical_url_planner.py --input examples/pages.csv --output report.txt
Input Formats
CSV Format
page_url,canonical_url
https://www.example.com/,https://www.example.com/
https://www.example.com/old-page,https://www.example.com/new-page
https://www.example.com/no-canonical,
JSON Format
[
{"page_url": "https://www.example.com/", "canonical_url": "https://www.example.com/"},
{"page_url": "https://www.example.com/old-page", "canonical_url": "https://www.example.com/new-page"}
]
... continues with setup instructions, usage examples, and more.
📄 Code Sample .py preview
src/canonical_url_planner.py
#!/usr/bin/env python3
"""
Canonical URL Planner — SEO Toolkit by DataNest
Plan and validate canonical URLs across your site. Detect duplicates,
conflicting canonicals, canonical chains, self-referencing issues,
HTTP/HTTPS mismatches, and pages missing canonical tags entirely.
Why this exists:
Canonical tags are deceptively simple — one <link rel="canonical"> per
page. But at scale, they become a nightmare. Pages point to the wrong
canonical, chains form (A→B→C), duplicate content clusters emerge, and
protocol mismatches silently leak crawl budget. This tool audits your
entire canonical map in one pass and tells you exactly what to fix.
Usage:
python canonical_url_planner.py --input pages.csv
python canonical_url_planner.py --input pages.json --format json
python canonical_url_planner.py --input pages.csv --strict
License: MIT
"""
from __future__ import annotations
import argparse
import csv
import json
import logging
import sys
from collections import defaultdict
from dataclasses import dataclass, field
from pathlib import Path
from typing import Any
from urllib.parse import urlparse
# ---------------------------------------------------------------------------
# Constants
# ---------------------------------------------------------------------------
LOG = logging.getLogger("canonical-url-planner")
# ---------------------------------------------------------------------------
# Data Models
# ---------------------------------------------------------------------------
@dataclass
class PageCanonical:
"""A page and its declared canonical URL."""
# ... 478 more lines ...