GroveAI
Examples

RAG Implementation Examples

Step-by-step RAG implementation examples covering document chunking, embedding strategies, vector stores, retrieval techniques, and response generation for production systems.

Basic RAG Pipeline for Company Documentation

beginner

A straightforward RAG setup that ingests company documentation (Confluence, Notion, Google Docs), chunks it into passages, creates embeddings, stores them in a vector database, and retrieves relevant context to answer employee questions.

// Basic RAG pipeline
async function ragPipeline(query) {
  // 1. Embed the query
  const queryEmbedding = await embed(query);

  // 2. Retrieve relevant chunks
  const chunks = await vectorStore.search(queryEmbedding, { topK: 5 });

  // 3. Generate response with context
  const response = await llm.chat({
    messages: [
      { role: "system", content: "Answer based on the provided context. If the context doesn't contain the answer, say so." },
      { role: "user", content: `Context:\n${chunks.map(c => c.text).join("\n\n")}\n\nQuestion: ${query}` }
    ]
  });

  return { answer: response, sources: chunks.map(c => c.metadata.source) };
}

Key takeaway: Start with a simple RAG pipeline before optimising — chunking strategy and retrieval quality matter more than model choice.

Hybrid Search RAG with Keyword and Semantic Retrieval

intermediate

Combines traditional keyword search (BM25) with semantic vector search for retrieval. Uses reciprocal rank fusion to merge results, improving recall on queries that contain specific terms or product codes that pure semantic search misses.

// Hybrid search with reciprocal rank fusion
async function hybridSearch(query, topK = 10) {
  const [semanticResults, keywordResults] = await Promise.all([
    vectorStore.semanticSearch(query, topK * 2),
    searchIndex.bm25Search(query, topK * 2),
  ]);

  // Reciprocal rank fusion
  const scores = new Map();
  const k = 60; // RRF constant

  semanticResults.forEach((doc, i) => {
    const id = doc.id;
    scores.set(id, (scores.get(id) || 0) + 1 / (k + i + 1));
  });

  keywordResults.forEach((doc, i) => {
    const id = doc.id;
    scores.set(id, (scores.get(id) || 0) + 1 / (k + i + 1));
  });

  return [...scores.entries()]
    .sort((a, b) => b[1] - a[1])
    .slice(0, topK);
}

Key takeaway: Hybrid search (keyword + semantic) consistently outperforms either approach alone, especially for queries containing specific identifiers, codes, or proper nouns.

Agentic RAG with Query Decomposition

advanced

An advanced RAG system that decomposes complex queries into sub-questions, retrieves context for each, and synthesises a comprehensive answer. Handles follow-up questions by maintaining conversation context and re-retrieving as needed.

// Agentic RAG with query decomposition
async function agenticRAG(complexQuery, conversationHistory) {
  // Decompose into sub-questions
  const subQuestions = await llm.chat({
    messages: [{
      role: "system",
      content: "Break this question into 2-4 simpler sub-questions that can be answered independently."
    }, { role: "user", content: complexQuery }]
  });

  // Retrieve for each sub-question
  const subAnswers = await Promise.all(
    subQuestions.map(async (sq) => {
      const chunks = await hybridSearch(sq);
      return { question: sq, context: chunks, answer: await generateAnswer(sq, chunks) };
    })
  );

  // Synthesise final answer
  return await llm.chat({
    messages: [{
      role: "system",
      content: "Synthesise these sub-answers into a comprehensive response to the original question."
    }, { role: "user", content: JSON.stringify({ originalQuestion: complexQuery, subAnswers }) }]
  });
}

Key takeaway: Query decomposition dramatically improves RAG quality on complex questions that span multiple documents or topics.

RAG with Metadata Filtering and Access Control

intermediate

A RAG system that enforces document-level access control during retrieval. Filters results based on user permissions, department, and document classification level, ensuring users only see information they are authorised to access.

// RAG with access control
async function secureRAG(query, user) {
  const userPermissions = await getPermissions(user.id);

  const chunks = await vectorStore.search(queryEmbedding, {
    topK: 10,
    filter: {
      department: { $in: userPermissions.departments },
      classification: { $lte: userPermissions.clearanceLevel },
      active: true,
    }
  });

  // Double-check no restricted content leaked through
  const filtered = chunks.filter(c =>
    userPermissions.departments.includes(c.metadata.department)
  );

  return await generateResponse(query, filtered);
}

Key takeaway: Access control in RAG must happen at the retrieval layer, not the generation layer — filtering after retrieval risks leaking information in the LLM's reasoning.

Multi-Modal RAG with Images and Tables

advanced

A RAG pipeline that processes documents containing images, charts, and tables alongside text. Extracts table data into structured format, generates image descriptions, and includes all modalities in the retrieval and generation steps.

// Multi-modal RAG pipeline
async function multiModalIngest(document) {
  const elements = await parseDocument(document); // Returns text, tables, images

  const chunks = [];
  for (const element of elements) {
    if (element.type === "text") {
      chunks.push({ content: element.text, type: "text", embedding: await embed(element.text) });
    } else if (element.type === "table") {
      const summary = await llm.summariseTable(element.data);
      chunks.push({ content: summary, rawTable: element.data, type: "table", embedding: await embed(summary) });
    } else if (element.type === "image") {
      const description = await llm.describeImage(element.data);
      chunks.push({ content: description, type: "image", embedding: await embed(description) });
    }
  }

  await vectorStore.upsert(chunks);
}

Key takeaway: Multi-modal RAG requires separate processing pipelines for text, tables, and images — but unified retrieval across all modalities yields the best results.

RAG Evaluation and Quality Monitoring

intermediate

A system for continuously evaluating RAG quality by measuring retrieval relevance, answer faithfulness, and completeness. Uses automated evaluation with LLM-as-judge alongside human review sampling for ongoing quality assurance.

// RAG evaluation framework
async function evaluateRAG(testSet) {
  const results = [];

  for (const { query, expectedAnswer, relevantDocs } of testSet) {
    const { answer, sources } = await ragPipeline(query);

    const metrics = {
      retrievalPrecision: sources.filter(s => relevantDocs.includes(s)).length / sources.length,
      retrievalRecall: sources.filter(s => relevantDocs.includes(s)).length / relevantDocs.length,
      faithfulness: await llmJudge("Is this answer supported by the sources?", answer, sources),
      relevance: await llmJudge("Does this answer the question?", answer, query),
      completeness: await llmJudge("Does this cover all aspects of the expected answer?", answer, expectedAnswer),
    };

    results.push({ query, metrics });
  }

  return aggregateMetrics(results);
}

Key takeaway: You cannot improve RAG quality without measuring it — implement evaluation from day one, not as an afterthought.

Patterns

Key patterns to follow

  • Chunking strategy has the biggest impact on RAG quality — experiment with chunk sizes, overlap, and semantic chunking
  • Hybrid search (keyword + semantic) outperforms pure semantic search for most real-world queries
  • Query decomposition and step-back prompting improve results on complex, multi-part questions
  • Access control must be implemented at the retrieval layer to prevent information leakage
  • Continuous evaluation with automated metrics is essential for maintaining RAG quality over time

FAQ

Frequently asked questions

For getting started, Pinecone or Weaviate offer managed solutions. For self-hosted, Qdrant and Milvus are excellent choices. PostgreSQL with pgvector works well for smaller datasets (under 1M vectors) and simplifies your stack. Choose based on scale, latency requirements, and operational preferences.

Start with 500-1000 token chunks with 100-200 token overlap. Use semantic chunking (splitting at paragraph or section boundaries) rather than fixed-size chunks. For structured documents, chunk by section or heading. Test different strategies with your actual queries to find the optimal approach.

Use strong system prompts instructing the model to only answer from provided context, implement confidence scoring, add citation requirements, use hybrid search for better retrieval, and validate answers against retrieved chunks before returning them.

OpenAI's text-embedding-3-large and Cohere's embed-v3 are strong commercial options. For open-source, BGE-large and E5-large perform well. Match your embedding model to your domain — fine-tuned embeddings on domain-specific data often outperform general-purpose models.

Implement incremental ingestion pipelines that detect changes in source documents, re-chunk and re-embed modified content, and remove stale entries. Set up automated sync schedules for each data source and monitor freshness metrics.

Need custom AI implementation?

Our team can help you build production-ready AI solutions. Book a free strategy call.