How long does an AI implementation take?

Most single workflow implementations take 2-6 weeks from kickoff to production. Full AI transformation programmes run 6-12 weeks.

Do you work with specific AI models?

We are model-agnostic and work with all major providers including Anthropic Claude, OpenAI GPT, Google Gemini, Meta Llama, Mistral, and more.

Can you deploy AI on our own servers?

Yes. Our Local & Private AI service deploys models on your own infrastructure or private cloud.

AI Profile

Llama 4: Meta's Open-Weight Frontier

Llama 4 is Meta's latest open-weight model family featuring Maverick (400B MoE, 128K context) and Scout (109B MoE, 10M context), bringing multimodal capabilities and frontier performance to the open-source ecosystem.

Specifications

At a glance

Parameters

Scout: 109B MoE (17B active) / Maverick: 400B MoE (17B active)

Context Window

Scout: 10M tokens / Maverick: 128K–1M tokens

Training Data Cutoff

August 2024

Release Date

April 2025

Licence

Llama 4 Community Licence (Open Weight)

Pricing

Free (self-hosted) or via cloud providers

Modalities

Text, Images

Overview

About Llama 4

Llama 4 is Meta's fourth-generation open-weight model family, representing a significant architectural shift. Both released models — Scout and Maverick — use a Mixture of Experts (MoE) architecture, activating only a fraction of their total parameters per token for efficiency. Critically, Llama 4 is natively multimodal, accepting both text and image inputs out of the box. Llama 4 Scout is the more remarkable of the two for its unprecedented 10 million token context window — by far the largest of any open model. With 109B total parameters (17B active), Scout is designed for tasks requiring massive context: processing entire codebases, books, or extensive document collections. Llama 4 Maverick, with 400B total parameters (17B active), targets frontier performance and competes directly with GPT-4o and Gemini 2.0 Flash on quality benchmarks. Both models continue Meta's commitment to open weights under the Llama Community Licence, enabling self-hosting, fine-tuning, and full data control. The shift to MoE makes Llama 4 more efficient at inference than the dense Llama 3 models, though hosting MoE models still requires careful infrastructure planning.

Strengths

Capabilities

Fully open weights with MoE architecture for efficient inference
Scout variant with industry-leading 10M token context window
Native multimodal capabilities accepting text and image inputs
Maverick variant competing at frontier performance levels
Fine-tuning and domain adaptation support
Strong multilingual capabilities
Active open-source ecosystem with extensive tooling

Considerations

Limitations

MoE architecture requires specialised infrastructure for self-hosting
Commercial licence retains restrictions for apps with 700M+ monthly users
Image input only — no image generation or audio support
Scout's 10M context may not be fully utilised without careful implementation
Newer release with a still-maturing ecosystem compared to Llama 3

Best For

Ideal use cases

Organisations with strict data sovereignty and privacy requirements
Processing extremely long documents with Scout's 10M context
Fine-tuning for domain-specific applications (legal, medical, finance)
High-throughput applications where API costs would be prohibitive
Multimodal applications needing text and image understanding

Pricing

Free to download and self-host. Also available via cloud providers: AWS Bedrock, Azure, Together AI, Fireworks AI, and others at competitive per-token rates.

FAQ

Frequently asked questions

Scout (109B MoE, 17B active) is optimised for long-context tasks with a 10M token window. Maverick (400B MoE, 17B active) targets frontier quality with stronger reasoning and generation. Both share the same MoE architecture and multimodal capabilities.

Llama 4 introduces MoE architecture (more efficient inference), native multimodal support (text + images), and massively expanded context windows. It represents a generational leap in architecture, though Llama 3 remains well-supported with a mature ecosystem.

Yes. Llama 4's open weights enable full fine-tuning, LoRA fine-tuning, and other customisation approaches. The MoE architecture does add complexity to fine-tuning compared to dense models, so tooling support is still catching up.

Maverick (400B total, 17B active) requires multi-GPU setups despite its efficient MoE inference. Scout (109B total, 17B active) is more manageable. Quantised versions reduce requirements significantly. Cloud providers offer hosted options for teams without GPU infrastructure.

Llama 4 is open-weight under the Llama Community Licence. The model weights are freely downloadable for commercial use, but the training code and data are not fully open. Applications with over 700 million monthly active users must request a special licence from Meta.

Need help with Llama 4?

Our team can help you evaluate and implement the right AI tools. Book a free strategy call.

Book a Strategy Call View Pricing