GroveAI
Examples

AI Document Processing Examples

Practical examples of using AI to extract, classify, and process documents — from invoices and contracts to medical records and regulatory filings.

Invoice Data Extraction Pipeline

intermediate

An end-to-end pipeline that ingests invoices in PDF, image, or email format, extracts key fields (vendor, amount, line items, dates), validates against purchase orders, and pushes structured data to the accounting system.

// Invoice extraction pipeline
async function processInvoice(document) {
  // Step 1: Convert to text (OCR if needed)
  const text = document.type === "pdf"
    ? await extractPdfText(document)
    : await runOCR(document);

  // Step 2: Extract structured fields with LLM
  const extracted = await llm.chat({
    messages: [{
      role: "system",
      content: "Extract invoice fields as JSON: vendor, invoiceNumber, date, lineItems[], subtotal, tax, total, paymentTerms"
    }, { role: "user", content: text }],
    responseFormat: { type: "json_object" }
  });

  // Step 3: Validate against PO
  const po = await findMatchingPO(extracted.vendor, extracted.total);
  return { ...extracted, poMatch: po?.id, confidence: po ? "high" : "review" };
}

Key takeaway: Combining OCR with LLM extraction achieves 95%+ accuracy on structured documents — far exceeding traditional template-based approaches.

Contract Clause Detection and Risk Scoring

advanced

A system that analyses legal contracts to identify key clauses (indemnity, liability caps, termination, IP assignment), flags risky language, and generates a risk summary for legal review. Trained on company-specific risk criteria.

// Contract risk analysis
const riskCriteria = {
  unlimitedLiability: { severity: "high", pattern: "unlimited liability" },
  autoRenewal: { severity: "medium", pattern: "automatic renewal" },
  ipAssignment: { severity: "high", pattern: "assign.*intellectual property" },
  nonCompete: { severity: "medium", pattern: "non-compete|restrictive covenant" },
};

async function analyseContract(contractText) {
  const clauses = await llm.chat({
    messages: [{
      role: "system",
      content: `Identify and classify all clauses in this contract. For each clause, provide: type, text, riskLevel (low/medium/high), and explanation.`
    }, { role: "user", content: contractText }],
    responseFormat: { type: "json_object" }
  });

  const riskScore = clauses.reduce((sum, c) => sum + riskWeights[c.riskLevel], 0);
  return { clauses, riskScore, requiresReview: riskScore > threshold };
}

Key takeaway: AI contract review reduces first-pass review time by 80% — but always keep human lawyers for final sign-off on high-value agreements.

Medical Record Summarisation

advanced

Processes patient medical records to extract diagnoses, medications, lab results, and care history into a structured timeline. Handles handwritten notes via OCR and reconciles information across multiple document sources.

// Medical record summarisation
async function summarisePatientRecord(documents) {
  const timeline = [];

  for (const doc of documents) {
    const text = doc.handwritten ? await medicalOCR(doc) : await extractText(doc);

    const extracted = await llm.chat({
      messages: [{
        role: "system",
        content: "Extract: date, provider, diagnoses[], medications[], labResults[], procedures[], notes. Use SNOMED codes where possible."
      }, { role: "user", content: text }]
    });

    timeline.push({ ...extracted, sourceDoc: doc.id });
  }

  // Reconcile and deduplicate across sources
  return reconcileTimeline(timeline);
}

Key takeaway: Medical document processing requires strict data governance — process on-premises or in a healthcare-compliant cloud environment, never send patient data to public APIs.

Automated Expense Report Processing

beginner

Processes expense receipts submitted via email or mobile app, categorises spending, checks against company policy limits, and routes for approval. Handles multiple currencies and receipt formats.

// Expense receipt processing
async function processExpenseReceipt(receipt) {
  const extracted = await llm.extractStructured(receipt.image, {
    fields: ["merchant", "date", "amount", "currency", "category", "items"]
  });

  // Check against policy
  const policy = await getPolicyLimits(receipt.submitterId);
  const violations = [];
  if (extracted.amount > policy.maxSingleExpense) {
    violations.push("Exceeds single expense limit");
  }
  if (!policy.allowedCategories.includes(extracted.category)) {
    violations.push("Category not in approved list");
  }

  return {
    ...extracted,
    policyCompliant: violations.length === 0,
    violations,
    approvalRoute: violations.length > 0 ? "manager" : "auto"
  };
}

Key takeaway: Automated expense processing pays for itself quickly — most companies see ROI in under 2 months from reduced manual processing time and faster reimbursements.

Regulatory Filing Document Assembly

advanced

Automates the assembly of regulatory filings by pulling data from multiple internal systems, populating templates, cross-referencing prior submissions for consistency, and flagging discrepancies for human review.

// Regulatory filing assembly
async function assembleRegFiling(filingType, period) {
  // Gather data from multiple sources
  const [financials, riskData, complianceNotes] = await Promise.all([
    fetchFinancialData(period),
    fetchRiskAssessments(period),
    fetchComplianceNotes(period),
  ]);

  // Check consistency with prior filings
  const priorFiling = await getPriorFiling(filingType, previousPeriod);
  const inconsistencies = await llm.chat({
    messages: [{
      role: "system",
      content: "Compare current data with prior filing. Flag any material changes or inconsistencies that need explanation."
    }, { role: "user", content: JSON.stringify({ current: financials, prior: priorFiling }) }]
  });

  // Populate template
  const filing = await populateTemplate(filingType, { financials, riskData, complianceNotes });
  return { filing, inconsistencies, requiresReview: inconsistencies.length > 0 };
}

Key takeaway: Document assembly is a high-value AI use case because errors in regulatory filings carry real penalties — AI catches inconsistencies humans miss under time pressure.

Patterns

Key patterns to follow

  • Combine OCR with LLM extraction for best results — OCR alone misses context, LLMs alone cannot read images
  • Always include a human review step for high-stakes documents (legal, medical, financial)
  • Structured output formats (JSON schemas) dramatically improve extraction reliability
  • Validation against existing data (POs, policies, prior filings) catches errors that pure extraction misses
  • On-premises or private cloud processing is essential for sensitive documents

FAQ

Frequently asked questions

Modern AI extraction achieves 90-98% accuracy on structured documents like invoices and receipts. Accuracy drops for handwritten text (80-90%) and complex layouts. Always implement confidence scoring and human review for low-confidence extractions.

Yes, but with lower accuracy than printed text. Specialised medical and legal OCR models handle domain-specific handwriting better than general-purpose OCR. Expect 80-90% accuracy on clear handwriting, less for poor handwriting.

Most systems handle PDF, images (JPEG, PNG, TIFF), Microsoft Office formats (DOCX, XLSX), and email (EML, MSG). Some handle scanned documents via OCR. The key is building a robust ingestion pipeline that normalises formats before processing.

Use on-premises or private cloud deployment, implement data encryption at rest and in transit, apply access controls, maintain audit logs, and ensure compliance with relevant regulations (GDPR, HIPAA, SOX). Never send sensitive documents to public AI APIs without proper data processing agreements.

Typical ROI ranges from 200-500% in the first year, driven by reduced manual processing time (60-80% reduction), fewer errors, faster turnaround, and improved compliance. Most organisations see payback within 3-6 months.

Need custom AI implementation?

Our team can help you build production-ready AI solutions. Book a free strategy call.