← Back to all products
$19
AI Content Detector
Python AI content detector using perplexity analysis, burstiness scoring, and statistical text analysis.
MarkdownPython
📄 Product Preview
Try the interactive reader and demo tools below, or get the full product with all content unlocked.
📖 Interactive Reader (Free Preview) ⚙ Try Demo Tools 📦 Download Free Sample📁 File Structure 11 files
ai-content-detector/
├── LICENSE
├── README.md
├── examples/
│ ├── basic_usage.py
│ └── sample_texts/
│ ├── ai_generated.txt
│ └── human_written.txt
├── free-sample.zip
├── guide/
│ ├── 01_features.md
│ ├── 02_cli-reference.md
│ └── 03_important-disclaimer.md
├── index.html
└── src/
└── ai_content_detector.py
📖 Documentation Preview README excerpt
AI Content Detector
Python AI content detector: perplexity analysis, burstiness scoring, vocabulary richness, statistical text analysis, and confidence-scored reports. All math from scratch. Zero dependencies.
Part of the AI Toolkit collection by [CodeVault](https://ai-toolkit.codevault.dev).
Features
- Perplexity estimation — Bigram language model measures text predictability
- Burstiness scoring — Sentence length variation analysis (AI text is suspiciously uniform)
- Vocabulary richness — Type-token ratio detects AI's characteristic word diversity
- Transition word density — AI overuses words like "Furthermore", "Additionally", "Moreover"
- Repetition detection — N-gram repetition patterns common in AI output
- Readability scoring — Flesch reading ease and syllable analysis
- Confidence reports — Weighted ensemble verdict with per-signal breakdown
- Batch analysis — Analyze entire directories of text files
- JSON export — Machine-readable reports for integration into workflows
Quick Start
# Run demo with AI and human text samples
python src/ai_content_detector.py --demo
# Analyze inline text
python src/ai_content_detector.py --text "Your text to analyze goes here..."
# Analyze a file
python src/ai_content_detector.py --file document.txt
# Analyze with JSON export
python src/ai_content_detector.py --file document.txt --export report.json
# Batch analyze a folder
python src/ai_content_detector.py --batch essays/ --export results.json
# Quick verdict only
python src/ai_content_detector.py --file document.txt --quiet
Project Structure
ai-content-detector/
├── README.md
├── LICENSE
├── src/
│ └── ai_content_detector.py # Core engine (~470 lines)
└── examples/
├── basic_usage.py # Programmatic usage example
└── sample_texts/ # AI and human text samples
├── ai_generated.txt
└── human_written.txt
CLI Reference
| Flag | Description |
|---|---|
--demo | Run demo with AI and human samples |
... continues with setup instructions, usage examples, and more.
📄 Code Sample .py preview
src/ai_content_detector.py
#!/usr/bin/env python3
"""
AI Content Detector — AI Toolkit (DataNest)
Detect AI-generated text using statistical analysis: perplexity estimation,
burstiness scoring, vocabulary richness metrics, sentence pattern analysis,
and confidence-scored reports. All math from scratch — zero external
dependencies. Python 3.10+ stdlib only.
Usage:
python ai_content_detector.py --text "The text to analyze..."
python ai_content_detector.py --file document.txt
python ai_content_detector.py --file document.txt --export report.json
python ai_content_detector.py --demo
python ai_content_detector.py --batch folder/ --export results.json
"""
from __future__ import annotations
import argparse
import json
import logging
import math
import re
import statistics
import string
import sys
from collections import Counter
from dataclasses import dataclass, field
from pathlib import Path
from typing import Any
# ---------------------------------------------------------------------------
# Logging
# ---------------------------------------------------------------------------
logging.basicConfig(
level=logging.INFO,
format="%(asctime)s [%(levelname)s] %(name)s: %(message)s",
)
logger = logging.getLogger("ai_content_detector")
# ---------------------------------------------------------------------------
# Constants — tuned from empirical observation of AI vs. human text
# ---------------------------------------------------------------------------
# Perplexity thresholds (lower = more predictable = more likely AI)
PERPLEXITY_AI_THRESHOLD: float = 35.0 # below this → likely AI
PERPLEXITY_HUMAN_THRESHOLD: float = 70.0 # above this → likely human
# Burstiness thresholds (AI text has LOW burstiness — very uniform sentence lengths)
BURSTINESS_AI_THRESHOLD: float = 0.3 # below this → likely AI
BURSTINESS_HUMAN_THRESHOLD: float = 0.6 # above this → likely human
# ... 614 more lines ...