← Back to all products
$19
List Hygiene Tool
Clean email lists by deduplicating, validating, removing role addresses, and detecting disposable domains.
MarkdownPython
📄 Product Preview
Try the interactive reader and demo tools below, or get the full product with all content unlocked.
📖 Interactive Reader (Free Preview) ⚙ Try Demo Tools 📦 Download Free Sample📁 File Structure 10 files
list-hygiene-tool/
├── LICENSE
├── README.md
├── examples/
│ └── sample_emails.txt
├── free-sample.zip
├── guide/
│ ├── 01_features.md
│ ├── 02_quick-start.md
│ ├── 03_report-output.md
│ └── 04_license.md
├── index.html
└── src/
└── list_hygiene.py
📖 Documentation Preview README excerpt
List Hygiene Tool
Clean email lists: deduplicate, validate format, remove role addresses, detect disposable domains, fix common typos, and export clean results.
Features
- Email normalization — Lowercase, strip dots from Gmail, remove
+aliastags - Syntax validation — RFC 5322-compliant regex checking
- Disposable domain detection — Flags 32+ known throwaway email services
- Role address detection — Identifies 30+ non-personal prefixes (admin@, support@, info@, etc.)
- Domain typo correction — Fixes 14 common misspellings (gmial.com → gmail.com, etc.)
- Deduplication — Removes exact duplicates after normalization
- Multiple output formats — CSV, TXT, or JSON
- Cleaning report — Summary of what was removed and why
Requirements
- Python 3.10+
- No external dependencies (stdlib only)
Quick Start
# Clean a text file (one email per line)
python src/list_hygiene.py examples/sample_emails.txt
# Output as CSV
python src/list_hygiene.py emails.txt --output clean.csv --format csv
# Generate a cleaning report
python src/list_hygiene.py emails.txt --report
# Keep role addresses (don't filter them out)
python src/list_hygiene.py emails.txt --keep-roles
# Keep disposable domains
python src/list_hygiene.py emails.txt --keep-disposable
# Disable typo correction
python src/list_hygiene.py emails.txt --no-typo-fix
# Disable deduplication
python src/list_hygiene.py emails.txt --no-dedup
What Gets Cleaned
| Check | Example | Action |
|---|---|---|
| Invalid syntax | not-an-email | Removed |
| Disposable domain | user@mailinator.com | Removed (unless --keep-disposable) |
| Role address | admin@example.com | Removed (unless --keep-roles) |
| Domain typo | user@gmial.com | Corrected to user@gmail.com |
| Duplicate | Multiple user@example.com | Deduplicated |
| Gmail dots | u.s.e.r@gmail.com | Normalized to user@gmail.com |
| Plus aliases | user+tag@example.com | Normalized to user@example.com |
Report Output
When using --report, you get a summary like:
... continues with setup instructions, usage examples, and more.
📄 Code Sample .py preview
src/list_hygiene.py
#!/usr/bin/env python3
"""
List Hygiene Tool — Email Arsenal (DataNest)
Cleans email lists by: deduplicating, validating format, removing role
addresses, checking for disposable domains, and stripping invalid entries.
Usage:
python list_hygiene.py emails.txt
python list_hygiene.py emails.csv --output clean.csv --format csv
python list_hygiene.py emails.txt --keep-roles --keep-disposable
Dependencies: Python 3.10+ stdlib only
License: MIT
"""
from __future__ import annotations
import argparse
import csv
import io
import json
import logging
import re
import sys
from dataclasses import asdict, dataclass, field
from datetime import datetime, timezone
from pathlib import Path
# ---------------------------------------------------------------------------
# Constants
# ---------------------------------------------------------------------------
logger = logging.getLogger("list_hygiene")
# RFC 5322 simplified email regex
EMAIL_REGEX = re.compile(
r"^[a-zA-Z0-9.!#$%&'*+/=?^_`{|}~-]+"
r"@[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?"
r"(?:\.[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?)*$"
)
# Known disposable email domains
DISPOSABLE_DOMAINS: frozenset[str] = frozenset({
"mailinator.com", "guerrillamail.com", "tempmail.com", "throwaway.email",
"yopmail.com", "sharklasers.com", "guerrillamailblock.com", "grr.la",
"dispostable.com", "trashmail.com", "fakeinbox.com", "mailnesia.com",
"maildrop.cc", "discard.email", "tempr.email", "10minutemail.com",
"guerrillamail.info", "throwaway.email", "temp-mail.org", "getnada.com",
"mailsac.com", "tmail.com", "tempail.com", "mohmal.com",
# ... 379 more lines ...