← Back to all products

Vector Search Setup

$29

Python vector search with index building, cosine similarity, and approximate nearest neighbor search.

📁 11 files
MarkdownPython

📄 Product Preview

Try the interactive reader and demo tools below, or get the full product with all content unlocked.

📖 Interactive Reader (Free Preview) ⚙ Try Demo Tools 📦 Download Free Sample

📁 File Structure 11 files

vector-search-setup/ ├── LICENSE ├── README.md ├── examples/ │ ├── basic_usage.py │ └── sample_documents.jsonl ├── free-sample.zip ├── guide/ │ ├── 01_features.md │ ├── 02_project-structure.md │ ├── 03_data-format.md │ └── 04_faq.md ├── index.html └── src/ └── vector_search_setup.py

📖 Documentation Preview README excerpt

Vector Search Setup

Python vector search engine with index building, cosine similarity, LSH-powered approximate nearest neighbors, and a query API. All math from scratch. Zero dependencies.

Part of the AI Toolkit collection by [CodeVault](https://ai-toolkit.codevault.dev).

Features

  • Cosine similarity — Manual implementation of dot product, magnitude, and cosine similarity
  • LSH indexing — Locality-sensitive hashing for sub-linear approximate nearest neighbor search
  • Exact search — Brute-force search for small indexes where perfect recall matters
  • Text-to-vector — Hashed bag-of-words encoder converts text to fixed-dimension vectors
  • Vocabulary builder — Automatic vocabulary extraction from your document corpus
  • Index persistence — Save and load indexes to/from JSON files
  • Performance benchmark — Compare exact vs. approximate search speed and recall
  • CLI interface — Build indexes, query them, and run benchmarks from the terminal

Quick Start


# Run the interactive demo with built-in sample data
python src/vector_search_setup.py --demo

# Build an index from a JSONL file
python src/vector_search_setup.py --build-index data.jsonl --output my_index.json

# Query an existing index
python src/vector_search_setup.py --query "machine learning algorithms" --index my_index.json --top-k 5

# Run performance benchmark
python src/vector_search_setup.py --benchmark --dim 128 --num-vectors 5000

Project Structure


vector-search-setup/
├── README.md
├── LICENSE
├── src/
│   └── vector_search_setup.py    # Core engine (~400 lines)
└── examples/
    ├── basic_usage.py             # Programmatic usage example
    └── sample_documents.jsonl     # Sample data for index building

CLI Reference

FlagDescription
--demoRun demo with built-in sample data
--build-index FILEBuild index from JSONL file
--output FILEOutput path for built index (default: index.json)
--query TEXTSearch query text
--index FILEPath to a saved index file
--top-k NNumber of results (default: 5)
--exactUse exact (brute force) search
--benchmarkRun performance benchmark
--dim NVector dimension for benchmark (default: 64)
--num-vectors NNumber of vectors for benchmark (default: 1000)

... continues with setup instructions, usage examples, and more.

📄 Code Sample .py preview

src/vector_search_setup.py #!/usr/bin/env python3 """ Vector Search Setup — AI Toolkit (DataNest) A self-contained vector search engine with index building, cosine similarity, approximate nearest neighbors (locality-sensitive hashing), and a query API. Implements all math from scratch — zero external dependencies. Python 3.10+ stdlib only. Usage: python vector_search_setup.py --build-index data.jsonl --output index.json python vector_search_setup.py --query "search text" --index index.json --top-k 5 python vector_search_setup.py --demo python vector_search_setup.py --benchmark --dim 128 --num-vectors 1000 """ from __future__ import annotations import argparse import hashlib import json import logging import math import random import sys import time from dataclasses import dataclass, field from pathlib import Path from typing import Any # --------------------------------------------------------------------------- # Logging # --------------------------------------------------------------------------- logging.basicConfig( level=logging.INFO, format="%(asctime)s [%(levelname)s] %(name)s: %(message)s", ) logger = logging.getLogger("vector_search_setup") # --------------------------------------------------------------------------- # Constants # --------------------------------------------------------------------------- DEFAULT_DIM: int = 64 # default embedding dimension DEFAULT_TOP_K: int = 5 # default number of results to return LSH_NUM_TABLES: int = 8 # number of LSH hash tables for ANN LSH_NUM_BITS: int = 16 # bits per hash (trade-off: precision vs speed) EPSILON: float = 1e-10 # avoid division by zero in similarity calcs # --------------------------------------------------------------------------- # ... 563 more lines ...
Buy Now — $29 Back to Products