The sitemap builder accepts URLs through three input modes: a raw URL list,
a JSON config with priority and change frequency rules, or stdin pipe.
{
"base_url": "https://example.com",
"rules": [
{"pattern": "/blog/**", "priority": 0.8, "changefreq": "daily"},
{"pattern": "/docs/**", "priority": 0.7, "changefreq": "weekly"},
{"pattern": "/products/**", "priority": 0.6, "changefreq": "weekly"},
{"pattern": "/about", "priority": 0.3, "changefreq": "monthly"},
{"pattern": "/legal/**", "priority": 0.1, "changefreq": "yearly"}
],
"exclude": ["/admin/**", "/staging/**", "*/draft", "*/temp-*"],
"default_priority": 0.5,
"default_changefreq": "monthly"
}The pattern field supports glob-style matching: ** matches any depth,
* matches within a single path segment. Exclusion patterns take precedence
— a URL matching both an include rule and an exclude pattern is omitted.
The generated sitemap.xml validates against the sitemaps.org schema:
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<url>
<loc>https://example.com/blog/deploying-nginx</loc>
<lastmod>2025-06-15</lastmod>
<changefreq>daily</changefreq>
<priority>0.8</priority>
</url>
</urlset>When the URL count exceeds 50,000, the builder automatically splits into
a sitemap index with child sitemaps:
python src/sitemap_builder.py --urls urls.txt --output sitemap.xml
# Warning: 62,340 URLs exceeds 50,000 limit
# Generated sitemap-index.xml with 2 child sitemapsThe optional --gzip flag compresses each output file, and --stats
prints a summary table with per-priority bucket counts, MIME type
distribution, and estimated index size in kilobytes.
For the full CLI reference and integration examples, see
02_configuration-reference.md.
Follow this guide to get Sitemap Builder up and running in your environment.
See examples/sitemap_config.json for a full example.
| Field | Type | Description |
|---|---|---|
base_url | string | Base URL to prepend to relative paths |
default_changefreq | string | Default change frequency |
default_priority | float | Default priority (0.0-1.0) |
priority_rules | object | Path pattern -> priority overrides |
changefreq_rules | object | Path pattern -> changefreq overrides |
exclude_patterns | array | Substring patterns to exclude |
urls | array | List of URL paths or full URLs |
| Flag | Description |
|---|---|
--urls, -u | Text file with one URL per line |
--config, -c | JSON config file |
--output, -o | Output file path (default: stdout) |
--base-url, -b | Base URL for relative paths |
--default-priority | Default priority value |
--default-changefreq | Default change frequency |
--no-pretty | Compact XML output |
--stats | Print statistics to stderr |
--verbose, -v | Debug logging |
Get the full Sitemap Builder and unlock everything.
Get the complete guide with every chapter unlocked, including code samples, diagrams, and best practices.
Access all interactive tools with complete data, all workload profiles, and the full scenario library.
Downloadable source code, configuration files, and working examples from every chapter.
Free updates for life. Every new chapter, tool, and improvement included.