How long does an AI implementation take?

Most single workflow implementations take 2-6 weeks from kickoff to production. Full AI transformation programmes run 6-12 weeks.

Do you work with specific AI models?

We are model-agnostic and work with all major providers including Anthropic Claude, OpenAI GPT, Google Gemini, Meta Llama, Mistral, and more.

Can you deploy AI on our own servers?

Yes. Our Local & Private AI service deploys models on your own infrastructure or private cloud.

AI Profile

DeepSeek V3: Efficient Frontier Performance

DeepSeek V3 is a 671B parameter Mixture of Experts model that achieves frontier-competitive performance while being trained at a fraction of typical costs, with open weights.

Specifications

At a glance

Parameters

671B total (37B active per token)

Context Window

128,000 tokens

Training Data Cutoff

2024

Release Date

December 2024

Licence

MIT Licence (Open Source)

Training Cost

~$5.6M (remarkably low)

Architecture

Mixture of Experts (MoE)

Overview

About DeepSeek V3

DeepSeek V3 is a groundbreaking open-weight model from Chinese AI lab DeepSeek, demonstrating that frontier-level performance can be achieved at dramatically lower training costs. With 671B total parameters but only 37B active per token (via its Mixture of Experts architecture), DeepSeek V3 delivers exceptional efficiency. The model was reportedly trained for approximately $5.6 million — a fraction of the hundreds of millions spent on comparable frontier models. Despite this cost efficiency, DeepSeek V3 matches or exceeds GPT-4o and Claude 3.5 Sonnet on many benchmarks, particularly in coding, mathematics, and Chinese language tasks. Released under the MIT licence, DeepSeek V3 has generated significant interest in the open-source community. Its success has challenged assumptions about the compute requirements for frontier AI and demonstrated the potential of MoE architectures combined with training innovations.

Strengths

Capabilities

Frontier-competitive performance at a fraction of training cost
671B total parameters with efficient 37B active inference
128K context window
Exceptional coding and mathematical reasoning
Strong Chinese and English bilingual capabilities
MIT licence enabling unrestricted commercial use
Highly efficient MoE architecture for cost-effective inference

Considerations

Limitations

Large model requiring significant GPU memory despite MoE efficiency
Newer model with a smaller ecosystem and fewer integrations
English performance slightly trails on some nuanced tasks
Limited cloud provider availability compared to Llama 3
Self-hosting MoE models requires specialised infrastructure

Best For

Ideal use cases

Coding and software engineering automation
Mathematical and scientific reasoning tasks
Chinese-English bilingual applications
Cost-conscious organisations wanting open frontier-class performance
Research into efficient AI training methodologies

Pricing

Free under MIT licence. Available via DeepSeek API (very competitive pricing), Together AI, Fireworks AI, and other inference providers.

FAQ

Frequently asked questions

DeepSeek V3 uses several training innovations including FP8 mixed-precision training, an efficient MoE architecture, and Multi-Head Latent Attention. These optimisations reduced training costs to roughly $5.6M, compared to hundreds of millions for comparable models.

MoE models contain many 'expert' sub-networks but only activate a subset for each input token. DeepSeek V3 has 671B total parameters but activates only 37B per token, delivering large-model quality at small-model inference costs.

Yes. DeepSeek V3 is released under the MIT licence, which places no restrictions on commercial use, modification, or distribution.

DeepSeek V3 generally outperforms Llama 3 405B on coding and mathematical benchmarks while being more efficient at inference due to its MoE architecture. Llama 3 405B has a larger Western ecosystem and community support.

DeepSeek V3 is a capable model suitable for production use. As with any open model, organisations should implement their own safety layers, content filtering, and monitoring appropriate to their use case.

Need help with DeepSeek V3?

Our team can help you evaluate and implement the right AI tools. Book a free strategy call.

Book a Strategy Call View Pricing