← Back to all products

Databricks Audit Toolkit

€39

A comprehensive audit and inventory toolkit for Databricks workspaces. Runs 12 independent audit modules covering security, governance, compute costs, access control, and data lineage — then generates professional HTML and JSON reports with risk scores.

📁 15 files🏷 v1.0.0
PythonDatabricksREST APIJinja2Security

📁 File Structure 15 files

databricks-audit-toolkit/ ├── config.py ├── utils.py ├── run_audit.sh ├── requirements.txt ├── LICENSE ├── README.md │ ├── audits/ │ ├── __init__.py │ ├── a01_catalog_inventory.py │ ├── a02_job_inventory.py │ ├── a03_access_audit.py │ ├── a04_secret_inventory.py │ ├── a05_compute_analysis.py │ ├── a06_security_findings.py │ ├── a07_audit_logs.py │ ├── a08_external_storage.py │ ├── a09_token_inventory.py │ ├── a10_query_history.py │ ├── a11_data_lineage.py │ └── a12_table_permissions.py │ ├── reports/ │ ├── __init__.py │ ├── html_report.py │ ├── json_report.py │ └── templates/ │ └── audit_report.html │ ├── examples/ │ ├── sample_output.json │ └── scheduling_guide.md │ └── output/ └── .gitkeep

📖 Documentation Preview README excerpt

Features

  • 12 audit modules covering Unity Catalog, access control, secrets, compute, tokens, query history, data lineage, external storage
  • Professional HTML reports with risk scores, severity distribution, and actionable recommendations
  • Machine-readable JSON output for SIEM ingestion, dashboards, and trend analysis
  • Risk scoring engine that aggregates findings into a 0-100 risk score with CRITICAL/HIGH/MEDIUM/LOW ratings
  • Rate limiting and retry logic — respects Databricks API limits with exponential backoff
  • Extensible architecture — add your own audit modules by inheriting from BaseAudit

Quick Start

# 1. Install dependencies
pip install -r requirements.txt

# 2. Set credentials
export DATABRICKS_HOST='https://your-workspace.cloud.databricks.com'
export DATABRICKS_TOKEN='dapi...'

# 3. Run all audits
./run_audit.sh --all

# 4. Open the report
open output/audit_report.html

Risk Scoring

CRITICAL (25 weight), HIGH (10), MEDIUM (4), LOW (1), INFO (0). Raw weighted sum capped at 100. Ratings: 75-100 CRITICAL, 50-74 HIGH, 25-49 MEDIUM, 0-24 LOW.

📄 Code Sample .py preview

config.py """ Databricks Audit Toolkit - Central Configuration All credentials are loaded from environment variables. Never hardcode tokens, URLs, or other secrets in source code. Required environment variables: DATABRICKS_HOST - Your workspace URL DATABRICKS_TOKEN - Personal access token or SP token Optional environment variables: DATABRICKS_WAREHOUSE_ID - SQL warehouse for query audits AUDIT_OUTPUT_DIR - Output directory (default: ./output) AUDIT_LOG_LEVEL - DEBUG, INFO, WARNING, ERROR AUDIT_MAX_RETRIES - Max API retry attempts (default: 3) AUDIT_REQUEST_TIMEOUT - Timeout in seconds (default: 60) AUDIT_RATE_LIMIT_RPS - Max requests per second (default: 10) """ import os import sys # --------------------------------------------------------------- # Credential configuration (always from environment) # --------------------------------------------------------------- DATABRICKS_HOST: str = os.environ.get("DATABRICKS_HOST", "").rstrip("/") DATABRICKS_TOKEN: str = os.environ.get("DATABRICKS_TOKEN", "") ... remaining implementation in full product
Buy Now — €39 Back to Products