← Back to all products

RAG Pipeline Starter

$29

Python RAG pipeline with document ingestion, text chunking, vector store, retrieval engine, and prompt assembly.

📁 9 files

PythonMarkdown

📄 Product Preview

Try the interactive reader and demo tools below, or get the full product with all content unlocked.

📖 Interactive Reader (Free Preview) ⚙ Try Demo Tools 📦 Download Free Sample

📁 File Structure 9 files

rag-pipeline-starter/ ├── LICENSE ├── README.md ├── examples/ │ └── basic_usage.py ├── free-sample.zip ├── guide/ │ ├── 01_features.md │ ├── 02_quick-start.md │ └── 03_license.md ├── index.html └── src/ └── rag_pipeline.py

📖 Documentation Preview README excerpt

RAG Pipeline Starter

Python RAG pipeline with document ingestion, text chunking, vector store, retrieval engine, and prompt assembly. Zero dependencies.

Part of the AI Toolkit collection by [CodeVault](https://ai-toolkit.codevault.dev).

Features

Document loader — Ingest .txt, .md, .py, .json, .csv files
Text chunker — Configurable chunk size and overlap
Vector store — In-memory store with cosine similarity search
Retrieval engine — Top-K retrieval with relevance scoring
Prompt assembler — Template-based prompt construction with context injection
Pipeline orchestrator — Single RAGPipeline class ties everything together
CLI + API — Use from terminal or import as a library
Demo mode — Built-in sample docs to see it working instantly

Quick Start


# Run the built-in demo
python src/rag_pipeline.py --demo

# Ingest a directory and query
python src/rag_pipeline.py --ingest ./my-docs/ --query "How do I deploy?"

# Interactive mode
python src/rag_pipeline.py

License

MIT — use in personal, commercial, or client projects. No attribution required.

📄 Code Sample .py preview

src/rag_pipeline.py #!/usr/bin/env python3 """ RAG Pipeline Starter — AI Toolkit (DataNest) A complete Retrieval-Augmented Generation pipeline with document ingestion, text chunking, in-memory vector store, retrieval engine, and prompt assembly. Zero external dependencies — Python 3.10+ stdlib only. Usage: python rag_pipeline.py --ingest docs/ # ingest a directory python rag_pipeline.py --query "How do I deploy?" # query the pipeline python rag_pipeline.py --demo # run built-in demo """ from __future__ import annotations import argparse import hashlib import json import logging import math import os import re import textwrap import time from dataclasses import dataclass, field from pathlib import Path from typing import Any, Sequence logging.basicConfig(level=logging.INFO, format="%(asctime)s [%(levelname)s] %(name)s: %(message)s") logger = logging.getLogger("rag_pipeline") # --------------------------------------------------------------------------- # Constants # --------------------------------------------------------------------------- DEFAULT_CHUNK_SIZE: int = 512 # characters per chunk DEFAULT_CHUNK_OVERLAP: int = 64 # overlap between chunks DEFAULT_TOP_K: int = 3 # number of chunks to retrieve EMBEDDING_DIM: int = 128 # hash-embedding dimension PROMPT_TEMPLATE: str = textwrap.dedent("""\ Answer the question based on the context below. If the context does not contain enough information, say "I don't have enough information." Context: {context} Question: {question} Answer:""") # ... 302 more lines ...

Buy Now — $29 Back to Products