← Back to all products
€39
Databricks Audit Toolkit
A comprehensive audit and inventory toolkit for Databricks workspaces. Runs 12 independent audit modules covering security, governance, compute costs, access control, and data lineage — then generates professional HTML and JSON reports with risk scores.
PythonDatabricksREST APIJinja2Security
📁 File Structure 15 files
databricks-audit-toolkit/
├── config.py
├── utils.py
├── run_audit.sh
├── requirements.txt
├── LICENSE
├── README.md
│
├── audits/
│ ├── __init__.py
│ ├── a01_catalog_inventory.py
│ ├── a02_job_inventory.py
│ ├── a03_access_audit.py
│ ├── a04_secret_inventory.py
│ ├── a05_compute_analysis.py
│ ├── a06_security_findings.py
│ ├── a07_audit_logs.py
│ ├── a08_external_storage.py
│ ├── a09_token_inventory.py
│ ├── a10_query_history.py
│ ├── a11_data_lineage.py
│ └── a12_table_permissions.py
│
├── reports/
│ ├── __init__.py
│ ├── html_report.py
│ ├── json_report.py
│ └── templates/
│ └── audit_report.html
│
├── examples/
│ ├── sample_output.json
│ └── scheduling_guide.md
│
└── output/
└── .gitkeep
📖 Documentation Preview README excerpt
Features
- 12 audit modules covering Unity Catalog, access control, secrets, compute, tokens, query history, data lineage, external storage
- Professional HTML reports with risk scores, severity distribution, and actionable recommendations
- Machine-readable JSON output for SIEM ingestion, dashboards, and trend analysis
- Risk scoring engine that aggregates findings into a 0-100 risk score with CRITICAL/HIGH/MEDIUM/LOW ratings
- Rate limiting and retry logic — respects Databricks API limits with exponential backoff
- Extensible architecture — add your own audit modules by inheriting from
BaseAudit
Quick Start
# 1. Install dependencies
pip install -r requirements.txt
# 2. Set credentials
export DATABRICKS_HOST='https://your-workspace.cloud.databricks.com'
export DATABRICKS_TOKEN='dapi...'
# 3. Run all audits
./run_audit.sh --all
# 4. Open the report
open output/audit_report.html
Risk Scoring
CRITICAL (25 weight), HIGH (10), MEDIUM (4), LOW (1), INFO (0). Raw weighted sum capped at 100. Ratings: 75-100 CRITICAL, 50-74 HIGH, 25-49 MEDIUM, 0-24 LOW.
📄 Code Sample .py preview
config.py
"""
Databricks Audit Toolkit - Central Configuration
All credentials are loaded from environment variables.
Never hardcode tokens, URLs, or other secrets in source code.
Required environment variables:
DATABRICKS_HOST - Your workspace URL
DATABRICKS_TOKEN - Personal access token or SP token
Optional environment variables:
DATABRICKS_WAREHOUSE_ID - SQL warehouse for query audits
AUDIT_OUTPUT_DIR - Output directory (default: ./output)
AUDIT_LOG_LEVEL - DEBUG, INFO, WARNING, ERROR
AUDIT_MAX_RETRIES - Max API retry attempts (default: 3)
AUDIT_REQUEST_TIMEOUT - Timeout in seconds (default: 60)
AUDIT_RATE_LIMIT_RPS - Max requests per second (default: 10)
"""
import os
import sys
# ---------------------------------------------------------------
# Credential configuration (always from environment)
# ---------------------------------------------------------------
DATABRICKS_HOST: str = os.environ.get("DATABRICKS_HOST", "").rstrip("/")
DATABRICKS_TOKEN: str = os.environ.get("DATABRICKS_TOKEN", "")
... remaining implementation in full product