Back to notes
AI· 1 min read
Building a RAG pipeline from scratch
What Pinecone doesn't tell you about chunk overlap.
Overview
Retrieval-Augmented Generation (RAG) sounds simple: embed documents, store vectors, query at inference. The devil is in the chunking strategy — and most tutorials skip right over it.
The chunk overlap problem
Pinecone’s quickstart uses chunk_size=500, chunk_overlap=50. That overlap number is almost always wrong for structured technical content. Too little and context is severed at paragraph boundaries. Too much and you pay for duplicate embeddings with no recall improvement.
What actually worked
- Semantic chunking using sentence-transformers to split on topic boundaries, not character count
- Metadata enrichment: attach document title, section header, and page number to each chunk
- Hybrid search: BM25 + dense vector retrieval re-ranked with a cross-encoder
The hybrid approach cut irrelevant retrievals by ~40% compared to dense-only on our test set.