← Back to all products

List Hygiene Tool

$19

Clean email lists by deduplicating, validating, removing role addresses, and detecting disposable domains.

📁 10 files
MarkdownPython

📄 Product Preview

Try the interactive reader and demo tools below, or get the full product with all content unlocked.

📖 Interactive Reader (Free Preview) ⚙ Try Demo Tools 📦 Download Free Sample

📁 File Structure 10 files

list-hygiene-tool/ ├── LICENSE ├── README.md ├── examples/ │ └── sample_emails.txt ├── free-sample.zip ├── guide/ │ ├── 01_features.md │ ├── 02_quick-start.md │ ├── 03_report-output.md │ └── 04_license.md ├── index.html └── src/ └── list_hygiene.py

📖 Documentation Preview README excerpt

List Hygiene Tool

Clean email lists: deduplicate, validate format, remove role addresses, detect disposable domains, fix common typos, and export clean results.

Features

  • Email normalization — Lowercase, strip dots from Gmail, remove +alias tags
  • Syntax validation — RFC 5322-compliant regex checking
  • Disposable domain detection — Flags 32+ known throwaway email services
  • Role address detection — Identifies 30+ non-personal prefixes (admin@, support@, info@, etc.)
  • Domain typo correction — Fixes 14 common misspellings (gmial.com → gmail.com, etc.)
  • Deduplication — Removes exact duplicates after normalization
  • Multiple output formats — CSV, TXT, or JSON
  • Cleaning report — Summary of what was removed and why

Requirements

  • Python 3.10+
  • No external dependencies (stdlib only)

Quick Start


# Clean a text file (one email per line)
python src/list_hygiene.py examples/sample_emails.txt

# Output as CSV
python src/list_hygiene.py emails.txt --output clean.csv --format csv

# Generate a cleaning report
python src/list_hygiene.py emails.txt --report

# Keep role addresses (don't filter them out)
python src/list_hygiene.py emails.txt --keep-roles

# Keep disposable domains
python src/list_hygiene.py emails.txt --keep-disposable

# Disable typo correction
python src/list_hygiene.py emails.txt --no-typo-fix

# Disable deduplication
python src/list_hygiene.py emails.txt --no-dedup

What Gets Cleaned

CheckExampleAction
Invalid syntaxnot-an-emailRemoved
Disposable domainuser@mailinator.comRemoved (unless --keep-disposable)
Role addressadmin@example.comRemoved (unless --keep-roles)
Domain typouser@gmial.comCorrected to user@gmail.com
DuplicateMultiple user@example.comDeduplicated
Gmail dotsu.s.e.r@gmail.comNormalized to user@gmail.com
Plus aliasesuser+tag@example.comNormalized to user@example.com

Report Output

When using --report, you get a summary like:

... continues with setup instructions, usage examples, and more.

📄 Code Sample .py preview

src/list_hygiene.py #!/usr/bin/env python3 """ List Hygiene Tool — Email Arsenal (DataNest) Cleans email lists by: deduplicating, validating format, removing role addresses, checking for disposable domains, and stripping invalid entries. Usage: python list_hygiene.py emails.txt python list_hygiene.py emails.csv --output clean.csv --format csv python list_hygiene.py emails.txt --keep-roles --keep-disposable Dependencies: Python 3.10+ stdlib only License: MIT """ from __future__ import annotations import argparse import csv import io import json import logging import re import sys from dataclasses import asdict, dataclass, field from datetime import datetime, timezone from pathlib import Path # --------------------------------------------------------------------------- # Constants # --------------------------------------------------------------------------- logger = logging.getLogger("list_hygiene") # RFC 5322 simplified email regex EMAIL_REGEX = re.compile( r"^[a-zA-Z0-9.!#$%&'*+/=?^_`{|}~-]+" r"@[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?" r"(?:\.[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?)*$" ) # Known disposable email domains DISPOSABLE_DOMAINS: frozenset[str] = frozenset({ "mailinator.com", "guerrillamail.com", "tempmail.com", "throwaway.email", "yopmail.com", "sharklasers.com", "guerrillamailblock.com", "grr.la", "dispostable.com", "trashmail.com", "fakeinbox.com", "mailnesia.com", "maildrop.cc", "discard.email", "tempr.email", "10minutemail.com", "guerrillamail.info", "throwaway.email", "temp-mail.org", "getnada.com", "mailsac.com", "tmail.com", "tempail.com", "mohmal.com", # ... 379 more lines ...
Buy Now — $19 Back to Products