← Back to all products
$19
Robots.txt Generator
Robots.txt generator with rule validation, sitemap links, and crawler directive management.
JSONMarkdownPython
📄 Product Preview
Try the interactive reader and demo tools below, or get the full product with all content unlocked.
📖 Interactive Reader (Free Preview) ⚙ Try Demo Tools 📦 Download Free Sample📁 File Structure 9 files
robots-txt-generator/
├── LICENSE
├── README.md
├── examples/
│ └── robots_config.json
├── free-sample.zip
├── guide/
│ ├── 01_features.md
│ ├── 02_presets.md
│ └── 03_cli-flags.md
├── index.html
└── src/
└── robots_txt_generator.py
📖 Documentation Preview README excerpt
Robots.txt Generator
Part of the SEO Toolkit by CodeVault
Generate production-ready robots.txt files from JSON configs or built-in presets. Includes rules for AI crawlers, search engines, and SEO tools.
Features
- 4 built-in presets:
standard,strict,open,lockdown - Block AI crawlers (GPTBot, CCBot, anthropic-ai, Google-Extended)
- Multiple user-agent blocks with Allow/Disallow rules
- Crawl-delay support
- Sitemap references (from config or CLI flags)
- Generation timestamp
- Database of 20+ known crawler user-agents
- Config validation with warnings
- Python stdlib only — zero dependencies
Quick Start
# Use a preset
python src/robots_txt_generator.py --preset standard --sitemap https://example.com/sitemap.xml
# From a config file
python src/robots_txt_generator.py --config examples/robots_config.json --output robots.txt
# Block AI crawlers + allow search engines
python src/robots_txt_generator.py --preset strict --sitemap https://example.com/sitemap.xml
# Staging/dev lockdown
python src/robots_txt_generator.py --preset lockdown --output robots.txt
# List available presets
python src/robots_txt_generator.py --list-presets
# List known crawlers
python src/robots_txt_generator.py --list-crawlers
Presets
| Preset | Description |
|---|---|
standard | Allow everything, block admin paths |
strict | Block AI crawlers and SEO tools |
open | Allow all crawlers everywhere |
lockdown | Block everything (staging/dev) |
Configuration Reference
See examples/robots_config.json for a full example.
CLI Flags
| Flag | Description |
|---|---|
--config, -c | JSON config file |
--preset, -p | Use a preset (standard/strict/open/lockdown) |
--sitemap, -s | Sitemap URL (repeatable) |
... continues with setup instructions, usage examples, and more.
📄 Code Sample .py preview
src/robots_txt_generator.py
#!/usr/bin/env python3
"""
Robots.txt Generator — SEO Toolkit by DataNest
Generate production-ready robots.txt files from a JSON config. Supports
multiple user-agent blocks, sitemap references, crawl-delay directives,
and common presets for major crawlers.
Why this exists:
robots.txt seems simple until you need to block 15 different crawlers,
handle staging vs production, and remember the exact syntax. This tool
generates correct robots.txt from a config — with presets for the most
common setups.
Usage:
python robots_txt_generator.py --config robots_config.json
python robots_txt_generator.py --preset standard --sitemap https://example.com/sitemap.xml
python robots_txt_generator.py --preset strict --output robots.txt
License: MIT
"""
from __future__ import annotations
import argparse
import json
import logging
import sys
from dataclasses import dataclass, field
from datetime import datetime, timezone
from pathlib import Path
# ---------------------------------------------------------------------------
# Constants
# ---------------------------------------------------------------------------
LOG = logging.getLogger("robots-txt-generator")
# Well-known crawler user agents and what they do
KNOWN_CRAWLERS: dict[str, str] = {
"Googlebot": "Google's main web crawler",
"Googlebot-Image": "Google Images crawler",
"Googlebot-Video": "Google Video crawler",
"Bingbot": "Microsoft Bing's web crawler",
"Slurp": "Yahoo's web crawler",
"DuckDuckBot": "DuckDuckGo's web crawler",
"Baiduspider": "Baidu's web crawler",
"YandexBot": "Yandex's web crawler",
"facebot": "Facebook's crawler for link previews",
"Twitterbot": "Twitter/X's crawler for card previews",
# ... 353 more lines ...