How long does an AI implementation take?

Most single workflow implementations take 2-6 weeks from kickoff to production. Full AI transformation programmes run 6-12 weeks.

Do you work with specific AI models?

We are model-agnostic and work with all major providers including Anthropic Claude, OpenAI GPT, Google Gemini, Meta Llama, Mistral, and more.

Can you deploy AI on our own servers?

Yes. Our Local & Private AI service deploys models on your own infrastructure or private cloud.

AI Profile

Qwen 3: Alibaba's Open-Weight Powerhouse

Qwen 3 is Alibaba Cloud's latest open-weight model family featuring hybrid thinking/non-thinking modes, sizes from 0.6B to 235B (MoE), and strong multilingual, coding, and reasoning capabilities.

Specifications

At a glance

Parameters

0.6B / 1.7B / 4B / 8B / 14B / 30B / 32B / 235B-A22B (MoE)

Context Window

128,000 tokens

Training Data Cutoff

Early 2025

Release Date

April 2025

Licence

Apache 2.0

Pricing

Free (self-hosted) or via Alibaba Cloud

Overview

About Qwen 3

Qwen 3 is Alibaba Cloud's third-generation large language model family, marking a significant leap forward with hybrid thinking modes, broader model sizes, and improved performance across the board. The flagship Qwen 3 235B-A22B uses a Mixture of Experts architecture, activating only 22B parameters per token from a total of 235B, delivering frontier-competitive performance with efficient inference. The standout feature of Qwen 3 is its hybrid thinking mode: models can dynamically switch between a fast 'non-thinking' mode for straightforward queries and a slower 'thinking' mode with extended chain-of-thought reasoning for complex problems. This makes Qwen 3 competitive with dedicated reasoning models while remaining practical for general-purpose use. Released under the permissive Apache 2.0 licence across all variants, Qwen 3 has become one of the most popular open-weight model families globally. It supports over 100 languages, with particular strength in Chinese, English, and other Asian languages. The range of sizes — from 0.6B for edge deployment to 235B MoE for frontier tasks — makes it one of the most versatile open model families available.

Strengths

Capabilities

Hybrid thinking/non-thinking modes for adaptive reasoning
Wide range of sizes from 0.6B to 235B (MoE) parameters
100+ language support with broad multilingual capabilities
128K context window across model sizes
Apache 2.0 licence for all variants enabling unrestricted commercial use
Strong mathematical, coding, and scientific reasoning
MoE flagship for efficient frontier-class inference

Considerations

Limitations

MoE flagship model requires significant infrastructure to self-host
Ecosystem still smaller than Llama's in Western markets
Western language performance can trail Llama 4 on some tasks
Documentation quality varies between Chinese and English

Best For

Ideal use cases

Bilingual Chinese-English applications
Cost-effective self-hosted deployments with permissive licensing
Tasks requiring adaptive reasoning depth (hybrid thinking)
Mathematical reasoning and code generation tasks
Edge and mobile deployment with smaller model variants

Pricing

Free under Apache 2.0 for self-hosting. Also available through Alibaba Cloud Model Studio and various inference providers at competitive rates.

FAQ

Frequently asked questions

Qwen 3 models can switch between 'thinking' mode (extended chain-of-thought reasoning for complex problems) and 'non-thinking' mode (fast direct answers for simple queries). This allows a single model to handle both simple and complex tasks efficiently.

Both are competitive open-weight model families with MoE architectures. Qwen 3 offers hybrid thinking modes and Apache 2.0 licensing. Llama 4 has native multimodal support and Scout's massive 10M context window. Both are strong choices depending on your specific requirements.

Yes. All Qwen 3 variants are released under the Apache 2.0 licence, one of the most permissive open-source licences. There are no usage restrictions, user count limits, or commercial use restrictions.

The 235B MoE model activates only 22B parameters per token, making inference more efficient than a dense 235B model. It still requires multi-GPU setups (e.g., 4x A100 80GB). Smaller variants like the 8B and 14B models run comfortably on consumer hardware.

Qwen 3 is developed by Alibaba Cloud's Qwen team based in China. Alibaba's continued investment in open AI research has made Qwen one of the world's most popular open-weight model families.

Need help with Qwen 3?

Our team can help you evaluate and implement the right AI tools. Book a free strategy call.

Book a Strategy Call View Pricing