GroveAI
TechnicalFree Template

Prompt Engineering Playbook Template

A practical playbook for designing, testing, and managing prompts for LLM-powered applications. Covers prompt patterns, structured prompt design, testing methodology, version control, and continuous optimisation strategies.

Overview

What's included

Prompt design framework with examples
Common prompt patterns and when to use them
Prompt testing and evaluation methodology
Version control and change management process
Optimisation strategies for quality and cost
Prompt library management approach
1

Prompt Design Framework

Prompt Design Framework

The CRAFT Method

Use this structure when designing a new prompt:

C — Context: Set the stage. Who is the AI? What domain is it operating in?

You are a senior financial analyst at a UK investment firm.
You help clients understand their portfolio performance.

R — Role & Rules: Define constraints and guidelines.

Rules:
- Only use data provided in the context. Do not make up figures.
- Always express returns as percentages.
- If uncertain, say so rather than guessing.
- Respond in British English.

A — Action: Specify what the AI should do.

Analyse the provided portfolio data and produce a quarterly
performance summary highlighting top and bottom performers.

F — Format: Define the output structure.

Format your response as:
## Portfolio Summary
[2-3 sentence overview]

## Top Performers
[Table: Fund Name | Return % | Benchmark Delta]

## Underperformers
[Table: Fund Name | Return % | Recommendation]

T — Tone: Set the communication style.

Tone: Professional and clear. Avoid jargon.
Audience: Clients with moderate financial literacy.
2

Common Prompt Patterns

Common Prompt Patterns

1. Chain of Thought

When to use: Complex reasoning, multi-step problems, calculations

Think through this step by step:
1. First, identify the key variables
2. Then, calculate the intermediate result
3. Finally, derive the answer
Show your working.

2. Few-Shot Examples

When to use: When the AI needs to match a specific format or style

Classify the following customer messages.

Example 1:
Input: "My order hasn't arrived yet"
Category: Delivery Issue
Urgency: Medium

Example 2:
Input: "I'd like to cancel my subscription"
Category: Cancellation
Urgency: High

Now classify:
Input: "{user_message}"

3. Structured Output

When to use: When you need parseable, consistent output

Respond ONLY with valid JSON matching this schema:
{
  "summary": "string (max 100 words)",
  "sentiment": "positive | negative | neutral",
  "confidence": number (0.0 to 1.0),
  "key_topics": ["string"]
}

4. Self-Critique

When to use: When accuracy is critical

After generating your answer:
1. Review it for factual accuracy
2. Check it answers all parts of the question
3. Verify any numbers or calculations
4. If you find errors, correct them before responding

5. Persona + Audience

When to use: When tone and expertise level matter

You are explaining {topic} to a {audience}.
Adjust your language complexity accordingly.
Use analogies from their domain where helpful.
3

Prompt Testing & Versioning

Prompt Testing & Versioning

Prompt Version Tracking

VersionDateAuthorChange DescriptionTest ResultStatus
v1.0  Initial prompt /5 avg qualityProduction
v1.1  Added few-shot examples /5 avg qualityTesting
v1.2  Refined output format /5 avg qualityDraft

A/B Testing Framework

TestPrompt A (Control)Prompt B (Variant)MetricResult
 v1.0 (current prod)v1.1 (new)Quality scoreA:  /5 vs B:  /5
   LatencyA:  ms vs B:  ms
   Token costA:   vs B:  

Prompt Test Checklist

Before promoting a prompt to production:

  • Tested on  + evaluation examples
  • Quality score meets minimum threshold:  /5
  • No regression on previously passing test cases
  • Adversarial inputs handled correctly
  • Output format is consistent and parseable
  • Token usage is within budget: <   tokens average
  • Reviewed by second team member

Prompt Library Structure

prompts/
  ├── classification/
  │   ├── v1.0.txt
  │   ├── v1.1.txt
  │   └── README.md (changelog)
  ├── summarisation/
  │   ├── v1.0.txt
  │   └── README.md
  └── extraction/
      ├── v1.0.txt
      └── README.md

Each prompt file includes:

  • The full prompt template with variable placeholders
  • CRAFT metadata (context, rules, action, format, tone)
  • Test results summary
  • Known limitations

Instructions

How to use this template

1

Use the CRAFT framework for new prompts

Start every prompt with Context, Role/Rules, Action, Format, and Tone. This structure ensures completeness and consistency.

2

Select appropriate patterns

Choose from the common patterns based on your task type. Chain of thought for reasoning, few-shot for formatting, structured output for parsing.

3

Test before deploying

Run every prompt change against your evaluation dataset. Compare quality, cost, and latency against the current production version.

4

Version control all prompts

Store prompts in your code repository alongside the application code. Track changes, review as code, and link to test results.

5

Iterate based on production feedback

Monitor user feedback and quality metrics. Use low-scoring outputs as new test cases and iterate on the prompt.

Watch Out

Common mistakes to avoid

Writing vague instructions — be as specific as possible about what the AI should and should not do.
Not testing prompt changes — even small wording changes can significantly affect output quality.
Overloading a single prompt — if a prompt tries to do too much, split it into multiple focused prompts.
Ignoring token cost — long system prompts consume tokens on every request; optimise for both quality and efficiency.
Not using examples — few-shot examples are one of the most reliable ways to improve output consistency.

FAQ

Frequently asked questions

As short as possible while achieving the desired output quality. Most effective system prompts are 200-500 tokens. If your prompt exceeds 1000 tokens, consider whether all instructions are necessary or if some can be moved to few-shot examples.

Use few-shot examples when the output format or style is hard to describe in words. Use detailed instructions when the rules are clear and logical. Often a combination works best: clear rules plus 1-2 examples.

Cache responses for repeated queries, use shorter prompts where possible, move static context to system messages (which can be cached by some providers), and consider using smaller models for simpler tasks.

Centralise prompt management with 1-2 prompt engineers who understand the patterns and testing methodology. Other team members can propose changes, but a prompt engineer should review and test them before deployment.

Different models respond differently to prompts. If you need to support multiple models, maintain model-specific prompt variants and test each variant against the target model. Common differences include: structured output handling, instruction following, and reasoning capabilities.

Need a custom AI template?

Our team can build tailored templates for your specific business needs. Book a free strategy call.