← Back to all products
$19
Sitemap Builder
XML sitemap generator with automatic discovery, priority assignment, and changefreq configuration.
JSONMarkdownPython
📄 Product Preview
Try the interactive reader and demo tools below, or get the full product with all content unlocked.
📖 Interactive Reader (Free Preview) ⚙ Try Demo Tools 📦 Download Free Sample📁 File Structure 9 files
sitemap-builder/
├── LICENSE
├── README.md
├── examples/
│ └── sitemap_config.json
├── free-sample.zip
├── guide/
│ ├── 01_features.md
│ ├── 02_configuration-reference.md
│ └── 03_license.md
├── index.html
└── src/
└── sitemap_builder.py
📖 Documentation Preview README excerpt
Sitemap Builder
Part of the SEO Toolkit by CodeVault
Generate standards-compliant sitemap.xml files from URL lists or JSON configs. Supports priority rules, change frequencies, exclusion patterns, and sitemap indexes.
Features
- Generates valid sitemap.xml per the sitemaps.org protocol
- Configurable priority rules (e.g.,
/blog/= 0.8,/legal/= 0.1) - Configurable changefreq rules (e.g., blog = daily, legal = yearly)
- URL exclusion patterns (skip admin pages, staging URLs, etc.)
- Accepts URLs from a text file, JSON config, or stdin
- Pretty-printed or compact XML output
- Statistics mode: shows URL counts and output size
- Warns when exceeding the 50,000 URL sitemap limit
- Python stdlib only — zero dependencies
Quick Start
# From a URL list file
python src/sitemap_builder.py --urls urls.txt --output sitemap.xml
# From a JSON config (with rules and priorities)
python src/sitemap_builder.py --config examples/sitemap_config.json --output sitemap.xml
# Pipe URLs from another command
curl -s https://api.example.com/urls | python src/sitemap_builder.py --output sitemap.xml
# With stats
python src/sitemap_builder.py --config examples/sitemap_config.json --output sitemap.xml --stats
Configuration Reference
See examples/sitemap_config.json for a full example.
| Field | Type | Description |
|---|---|---|
base_url | string | Base URL to prepend to relative paths |
default_changefreq | string | Default change frequency |
default_priority | float | Default priority (0.0-1.0) |
priority_rules | object | Path pattern -> priority overrides |
changefreq_rules | object | Path pattern -> changefreq overrides |
exclude_patterns | array | Substring patterns to exclude |
urls | array | List of URL paths or full URLs |
CLI Flags
| Flag | Description |
|---|---|
--urls, -u | Text file with one URL per line |
--config, -c | JSON config file |
--output, -o | Output file path (default: stdout) |
--base-url, -b | Base URL for relative paths |
--default-priority | Default priority value |
--default-changefreq | Default change frequency |
--no-pretty | Compact XML output |
--stats | Print statistics to stderr |
... continues with setup instructions, usage examples, and more.
📄 Code Sample .py preview
src/sitemap_builder.py
#!/usr/bin/env python3
"""
Sitemap Builder — SEO Toolkit by DataNest
Reads a list of URLs (from a file, JSON config, or stdin) and generates a
standards-compliant sitemap.xml with configurable priorities, change
frequencies, and last-modified dates.
Why this exists:
Most sitemap generators require installing a full CMS plugin or a heavy
npm package. This is a single Python script — feed it your URLs and get a
valid sitemap.xml in seconds. Perfect for static sites, JAMStack builds,
or any CI/CD pipeline.
Usage:
python sitemap_builder.py --urls urls.txt --output sitemap.xml
python sitemap_builder.py --config sitemap_config.json --output sitemap.xml
cat urls.txt | python sitemap_builder.py --output sitemap.xml
License: MIT
"""
from __future__ import annotations
import argparse
import json
import logging
import sys
import xml.etree.ElementTree as ET
from dataclasses import dataclass, field
from datetime import datetime, timezone
from pathlib import Path
from typing import TextIO
from xml.dom.minidom import parseString
# ---------------------------------------------------------------------------
# Constants
# ---------------------------------------------------------------------------
# Maximum URLs per sitemap (search engine limit)
MAX_URLS_PER_SITEMAP = 50_000
# Maximum sitemap file size (50 MB uncompressed)
MAX_SITEMAP_SIZE_BYTES = 50 * 1024 * 1024
# Valid changefreq values per the sitemaps.org protocol
VALID_CHANGEFREQS = {
"always", "hourly", "daily", "weekly", "monthly", "yearly", "never",
}
# ... 359 more lines ...