How long does an AI implementation take?

Most single workflow implementations take 2-6 weeks from kickoff to production. Full AI transformation programmes run 6-12 weeks.

Do you work with specific AI models?

We are model-agnostic and work with all major providers including Anthropic Claude, OpenAI GPT, Google Gemini, Meta Llama, Mistral, and more.

Can you deploy AI on our own servers?

Yes. Our Local & Private AI service deploys models on your own infrastructure or private cloud.

AI Profile

Whisper: OpenAI's Speech Recognition Model

Whisper is OpenAI's open-source automatic speech recognition model, supporting 99 languages with robust accuracy across accents, background noise, and technical content.

Specifications

At a glance

Parameters

39M (tiny) to 1.55B (large-v3)

Languages Supported

99 languages

Release Date

September 2022 (v3: November 2023)

Licence

MIT Licence (Open Source)

Architecture

Encoder-Decoder Transformer

Pricing

Free (self-hosted) / $0.006/min via API

Overview

About Whisper

Whisper is OpenAI's automatic speech recognition (ASR) model that has become the de facto standard for AI transcription. Trained on 680,000 hours of multilingual audio data, it delivers robust transcription accuracy across a wide range of conditions including accents, background noise, and technical vocabulary. Available in multiple sizes from 39M (tiny) to 1.55B (large-v3) parameters, Whisper can be deployed on everything from edge devices to cloud servers. The large-v3 model approaches human-level accuracy for many languages and significantly outperforms previous open-source ASR models. Whisper's MIT licence has made it the foundation of countless transcription products, podcast tools, meeting recorders, and accessibility applications. The model also supports translation, automatically converting speech in any supported language to English text.

Strengths

Capabilities

99-language speech recognition and transcription
Robust to accents, background noise, and technical jargon
Multiple model sizes from 39M to 1.55B parameters
Automatic language detection
Speech-to-English translation for all supported languages
Timestamp generation for subtitle creation
MIT licence enabling unrestricted commercial use

Considerations

Limitations

Real-time transcription requires capable hardware for larger models
Accuracy drops for low-resource languages
No speaker diarisation (identifying who said what) built in
Can hallucinate repeated phrases on silent or unclear audio
No streaming support in the base model architecture

Best For

Ideal use cases

Meeting transcription and note-taking applications
Podcast and video subtitle generation
Multilingual customer support transcription
Accessibility tools for hearing-impaired users
Voice-to-text input for applications and workflows

Pricing

Free under MIT licence for self-hosting. OpenAI API: $0.006/minute. Various cloud providers offer hosted Whisper at competitive rates.

FAQ

Frequently asked questions

Whisper large-v3 achieves word error rates under 5% for English, approaching human-level accuracy. Performance varies by language, accent, and audio quality. For well-recorded English speech, accuracy is typically 95-98%.

The smaller models (tiny, base) can transcribe in near-real-time on modern hardware. The large model is slower than real-time on most consumer GPUs. Community projects like faster-whisper use optimisations to achieve real-time with larger models.

No. Whisper does not include speaker diarisation. For multi-speaker transcription, combine Whisper with a separate diarisation model like pyannote-audio.

For English transcription: medium offers the best accuracy/speed balance. For multilingual: use large-v3. For edge/mobile deployment: tiny or base. For maximum accuracy regardless of speed: large-v3.

Whisper and Google Speech-to-Text are competitive. Whisper offers free self-hosting, 99-language support, and no per-minute costs. Google offers streaming, speaker diarisation, and better real-time performance out of the box.

Need help with Whisper?

Our team can help you evaluate and implement the right AI tools. Book a free strategy call.

Book a Strategy Call View Pricing