AI Monitoring Dashboard Template
A template for designing your AI production monitoring dashboard. Defines the key metrics, alert thresholds, and visualisations needed to keep your AI system running reliably, accurately, and cost-effectively in production.
Overview
What's included
Core Metrics
Core Metrics
System name: Dashboard tool: Grafana / Datadog / CloudWatch / New Relic / Data source:
Performance Metrics
| Metric | Description | Collection Method | Granularity |
|---|---|---|---|
| Request latency (p50) | Median response time | API instrumentation | Per request |
| Request latency (p95) | 95th percentile response time | API instrumentation | Per request |
| Request latency (p99) | 99th percentile response time | API instrumentation | Per request |
| Time to first token | Latency until first token streamed | Client-side timing | Per request |
| Throughput | Requests per second | Counter | 1-minute average |
| Error rate | % of requests returning errors | Error counter / total | 5-minute average |
| Token usage (input) | Tokens consumed per request (input) | API response metadata | Per request |
| Token usage (output) | Tokens generated per request (output) | API response metadata | Per request |
Quality Metrics
| Metric | Description | Collection Method | Granularity |
|---|---|---|---|
| User satisfaction | Thumbs up/down or 1-5 rating | User feedback widget | Per interaction |
| Hallucination rate | % of flagged hallucinations | Automated check + user reports | Daily sample |
| Retrieval relevance | Average relevance of retrieved chunks (RAG) | Automated scoring | Per request |
| Task completion rate | % of queries resulting in successful outcomes | Outcome tracking | Daily |
| Escalation rate | % of queries escalated to human | Routing logic | Daily |
Cost Metrics
| Metric | Description | Collection Method | Granularity |
|---|---|---|---|
| Cost per request | Average £ per API call | Token count x pricing | Per request |
| Daily spend | Total AI API spend per day | Billing API / calculation | Daily |
| Monthly spend (projected) | Projected monthly cost based on current rate | Extrapolation | Daily |
| Cost per user/session | Average AI cost per user interaction | Session tracking | Per session |
Alert Configuration
Alert Configuration
Alert Severity Levels
| Level | Response Time | Notification Channel | Escalation |
|---|---|---|---|
| Critical | < 5 minutes | PagerDuty / Phone | On-call engineer → Team lead → CTO |
| High | < 30 minutes | Slack #ai-alerts + Email | On-call engineer → Team lead |
| Medium | < 4 hours | Slack #ai-alerts | On-call engineer |
| Low | Next business day | AI team |
Alert Definitions
| # | Alert Name | Condition | Severity | Action |
|---|---|---|---|---|
| 1 | High error rate | Error rate > % for 5 minutes | Critical | Check AI provider status; activate circuit breaker |
| 2 | Latency spike | p95 latency > ms for 10 minutes | High | Check for API throttling; scale infrastructure |
| 3 | Quality degradation | User satisfaction < /5 (rolling 24hr) | High | Review recent outputs; check for model changes |
| 4 | Cost spike | Daily spend > x normal | Medium | Investigate traffic spike; check for abuse |
| 5 | Token budget exceeded | Monthly projected spend > £ | Medium | Review usage patterns; consider optimisation |
| 6 | Model drift detected | Quality metrics shifted > % | Medium | Compare recent outputs to baseline; investigate root cause |
| 7 | Low throughput | Requests/min < for 15 minutes | Low | Check upstream systems; verify no outage |
| 8 | API key expiry | Key expires in < days | Low | Rotate API key before expiry |
Silencing Policy
- Alerts may be silenced during planned maintenance windows
- Maximum silence duration: hours
- Silencing requires approval from:
- All silenced alerts must be documented
Dashboard Layout
Dashboard Layout
Overview Dashboard (Single screen)
Row 1: Health Status
- Overall system status indicator (Green/Amber/Red)
- Current error rate gauge
- Active alerts count
- Current request throughput
Row 2: Performance
- Latency time series (p50, p95, p99) — last 24 hours
- Throughput time series — last 24 hours
- Error rate time series — last 24 hours
Row 3: Quality
- User satisfaction score trend — last 7 days
- Hallucination rate trend — last 7 days
- Task completion rate — last 7 days
- Escalation rate — last 7 days
Row 4: Cost
- Daily spend bar chart — last 30 days
- Current month spend vs budget gauge
- Cost per request trend — last 30 days
- Token usage breakdown (input vs output)
Deep-Dive Dashboard
- Per-endpoint latency breakdown
- Error distribution by type and code
- Token usage distribution histogram
- Top error messages table
- Slowest requests table
- User feedback log (recent thumbs down with details)
Access Control
| Role | Dashboard Access | Alert Configuration | Data Export |
|---|---|---|---|
| AI Team | Full access | Edit | Yes |
| Engineering | Full access | View only | Yes |
| Product | Overview only | View only | No |
| Leadership | Overview only | View only | No |
Instructions
How to use this template
Instrument your AI system
Add logging and metrics collection at every stage: input processing, AI API calls, output formatting, and user feedback.
Configure alerts before going live
Set up all critical and high-severity alerts before production launch. You need to know about problems faster than your users notice them.
Build the overview dashboard first
Start with the single-screen overview. Add deep-dive dashboards as you learn which metrics need investigation most often.
Establish baselines in the first 2 weeks
Run for 2 weeks to establish normal performance baselines before fine-tuning alert thresholds. Initial thresholds should be generous to avoid alert fatigue.
Watch Out
Common mistakes to avoid
FAQ
Frequently asked questions
General observability tools like Datadog, Grafana, and New Relic work well for infrastructure metrics. For AI-specific monitoring (quality, hallucinations, drift), consider LangSmith, Braintrust, Arize, or Helicone. Many teams use a combination.
Track quality metrics over time and compare against your baseline. Statistical process control (SPC) charts can detect when metrics move outside normal bounds. Also monitor input distribution changes, which often precede output quality changes.
Rely on alerts for real-time issues. Review the dashboard manually at least once daily during the first month, then weekly once the system stabilises. Do a deep review whenever alerts fire or user complaints increase.
Yes, with appropriate data handling. Logging enables debugging, quality analysis, and evaluation dataset building. Ensure you redact PII, comply with data retention policies, and restrict access to logs containing sensitive information.
Track token usage per request, calculate cost per request, and project monthly spend daily. Set budget alerts at 80% and 100% of your monthly budget. Investigate any day where spend is more than 2x the average.
Need a custom AI template?
Our team can build tailored templates for your specific business needs. Book a free strategy call.