← Back to all products
$29
RAG Pipeline Starter
Python RAG pipeline with document ingestion, text chunking, vector store, retrieval engine, and prompt assembly.
PythonMarkdown
📄 Product Preview
Try the interactive reader and demo tools below, or get the full product with all content unlocked.
📖 Interactive Reader (Free Preview) ⚙ Try Demo Tools 📦 Download Free Sample📁 File Structure 9 files
rag-pipeline-starter/
├── LICENSE
├── README.md
├── examples/
│ └── basic_usage.py
├── free-sample.zip
├── guide/
│ ├── 01_features.md
│ ├── 02_quick-start.md
│ └── 03_license.md
├── index.html
└── src/
└── rag_pipeline.py
📖 Documentation Preview README excerpt
RAG Pipeline Starter
Python RAG pipeline with document ingestion, text chunking, vector store, retrieval engine, and prompt assembly. Zero dependencies.
Part of the AI Toolkit collection by [CodeVault](https://ai-toolkit.codevault.dev).
Features
- Document loader — Ingest
.txt,.md,.py,.json,.csvfiles - Text chunker — Configurable chunk size and overlap
- Vector store — In-memory store with cosine similarity search
- Retrieval engine — Top-K retrieval with relevance scoring
- Prompt assembler — Template-based prompt construction with context injection
- Pipeline orchestrator — Single
RAGPipelineclass ties everything together - CLI + API — Use from terminal or import as a library
- Demo mode — Built-in sample docs to see it working instantly
Quick Start
# Run the built-in demo
python src/rag_pipeline.py --demo
# Ingest a directory and query
python src/rag_pipeline.py --ingest ./my-docs/ --query "How do I deploy?"
# Interactive mode
python src/rag_pipeline.py
License
MIT — use in personal, commercial, or client projects. No attribution required.
📄 Code Sample .py preview
src/rag_pipeline.py
#!/usr/bin/env python3
"""
RAG Pipeline Starter — AI Toolkit (DataNest)
A complete Retrieval-Augmented Generation pipeline with document ingestion,
text chunking, in-memory vector store, retrieval engine, and prompt assembly.
Zero external dependencies — Python 3.10+ stdlib only.
Usage:
python rag_pipeline.py --ingest docs/ # ingest a directory
python rag_pipeline.py --query "How do I deploy?" # query the pipeline
python rag_pipeline.py --demo # run built-in demo
"""
from __future__ import annotations
import argparse
import hashlib
import json
import logging
import math
import os
import re
import textwrap
import time
from dataclasses import dataclass, field
from pathlib import Path
from typing import Any, Sequence
logging.basicConfig(level=logging.INFO, format="%(asctime)s [%(levelname)s] %(name)s: %(message)s")
logger = logging.getLogger("rag_pipeline")
# ---------------------------------------------------------------------------
# Constants
# ---------------------------------------------------------------------------
DEFAULT_CHUNK_SIZE: int = 512 # characters per chunk
DEFAULT_CHUNK_OVERLAP: int = 64 # overlap between chunks
DEFAULT_TOP_K: int = 3 # number of chunks to retrieve
EMBEDDING_DIM: int = 128 # hash-embedding dimension
PROMPT_TEMPLATE: str = textwrap.dedent("""\
Answer the question based on the context below. If the context does not
contain enough information, say "I don't have enough information."
Context:
{context}
Question: {question}
Answer:""")
# ... 302 more lines ...