Skip to content
Back to notes
AI· 1 min read

Building a RAG pipeline from scratch

What Pinecone doesn't tell you about chunk overlap.

Overview

Retrieval-Augmented Generation (RAG) sounds simple: embed documents, store vectors, query at inference. The devil is in the chunking strategy — and most tutorials skip right over it.

The chunk overlap problem

Pinecone’s quickstart uses chunk_size=500, chunk_overlap=50. That overlap number is almost always wrong for structured technical content. Too little and context is severed at paragraph boundaries. Too much and you pay for duplicate embeddings with no recall improvement.

What actually worked

  • Semantic chunking using sentence-transformers to split on topic boundaries, not character count
  • Metadata enrichment: attach document title, section header, and page number to each chunk
  • Hybrid search: BM25 + dense vector retrieval re-ranked with a cross-encoder

The hybrid approach cut irrelevant retrievals by ~40% compared to dense-only on our test set.