How long does an AI implementation take?

Most single workflow implementations take 2-6 weeks from kickoff to production. Full AI transformation programmes run 6-12 weeks.

Do you work with specific AI models?

We are model-agnostic and work with all major providers including Anthropic Claude, OpenAI GPT, Google Gemini, Meta Llama, Mistral, and more.

Can you deploy AI on our own servers?

Yes. Our Local & Private AI service deploys models on your own infrastructure or private cloud.

AI Profile

DeepSeek R1: Open-Source Reasoning Pioneer

DeepSeek R1 is an open-source reasoning model that uses extended chain-of-thought to excel at mathematics, coding, and complex analytical tasks, competing with proprietary reasoning models at a fraction of the cost.

Specifications

At a glance

Parameters

671B total (37B active per token, MoE)

Context Window

128,000 tokens

Training Data Cutoff

2024

Release Date

January 2025

Licence

MIT Licence (Open Source)

Architecture

Mixture of Experts (MoE)

Pricing (DeepSeek API)

$0.55/1M input, $2.19/1M output

Overview

About DeepSeek R1

DeepSeek R1 is a reasoning-focused model from Chinese AI lab DeepSeek, designed to compete with OpenAI's o1 and o3 reasoning models. It uses extended chain-of-thought reasoning — thinking through problems step by step before producing a final answer — to achieve exceptional performance on mathematics, coding, science, and complex analytical tasks. What makes DeepSeek R1 particularly notable is that it is fully open-source under the MIT licence, making it the first open reasoning model to approach the performance of proprietary alternatives. The model shares the same 671B MoE architecture as DeepSeek V3, with 37B parameters active per token, but has been further trained with reinforcement learning to develop strong reasoning capabilities. DeepSeek also released distilled versions of R1 (based on Qwen and Llama architectures in sizes from 1.5B to 70B), enabling reasoning capabilities on consumer hardware. The combination of frontier reasoning performance, open weights, and extremely competitive API pricing has made R1 one of the most impactful model releases, challenging assumptions about the cost of building reasoning models.

Strengths

Capabilities

Extended chain-of-thought reasoning for complex problem solving
Exceptional mathematical and scientific reasoning
Strong coding performance across multiple languages
Open-source under MIT licence with full weights available
Efficient MoE architecture (37B active of 671B total)
Distilled variants from 1.5B to 70B for resource-constrained deployment
Extremely competitive API pricing via DeepSeek

Considerations

Limitations

Reasoning mode significantly increases latency and token usage
Full model requires substantial GPU infrastructure to self-host
Chain-of-thought can be verbose and increase output costs
General conversation and creative tasks are not its primary strength
Newer model with a smaller ecosystem than established alternatives

Best For

Ideal use cases

Mathematical problem solving and proof verification
Complex coding challenges and algorithm design
Scientific reasoning and data analysis
Tasks requiring step-by-step logical deduction
Cost-effective alternative to proprietary reasoning models

Pricing

DeepSeek API: $0.55/1M input, $2.19/1M output (cache hits much cheaper). Free under MIT licence for self-hosting. Available via Together AI, Fireworks AI, and other inference providers.

FAQ

Frequently asked questions

DeepSeek R1 competes with o3 on mathematical and coding benchmarks at a fraction of the cost. o3 tends to lead on the hardest reasoning tasks, but R1's open-source nature and low pricing make it an attractive alternative for many use cases.

DeepSeek released smaller versions of R1's reasoning capabilities distilled into Qwen and Llama base models (1.5B to 70B parameters). These run on consumer hardware while retaining meaningful reasoning improvements over their base models.

R1 is optimised for reasoning tasks. For general conversation, creative writing, or tasks not requiring step-by-step deduction, DeepSeek V3 or a general-purpose model like Claude Sonnet 4.6 will typically provide better results with lower latency.

Yes. R1 is MIT-licensed with full weights available. The full 671B model requires multi-GPU infrastructure, but distilled variants (7B, 14B, 32B, 70B) are much more accessible. Quantised versions further reduce hardware requirements.

Need help with DeepSeek R1?

Our team can help you evaluate and implement the right AI tools. Book a free strategy call.

Book a Strategy Call View Pricing