Comprehensive LLM Comparison
Compare 25 major LLMs including GPT-5.1, Kimi K2, Llama 4, Mistral, Claude, Gemini, DeepSeek, Qwen3, and GLM 4.6. From $0.05 to $75 per million tokens.
Updated November 12, 2025 with GPT-5.1 Instant & Thinking, plus latest models from Moonshot AI, Meta, Anthropic, Google, Mistral, DeepSeek, Alibaba, and Zhipu AI
| Feature | o3-pro OpenAI | Kimi K2 Thinking Moonshot AI | Claude Sonnet 4.5 Anthropic | Claude Opus 4.1 Anthropic | GPT-5.1 Instant OpenAI | GPT-5.1 Thinking OpenAI | GPT-5 OpenAI | Gemini 2.5 Pro Google | o4-mini OpenAI | Claude Haiku 4.5 Anthropic | Grok 3 xAI | DeepSeek R1 DeepSeek | Mistral Medium 3 Mistral | Gemini 2.0 Flash Google | DeepSeek V3.1 DeepSeek | GPT-5 Mini OpenAI | Llama 4 Maverick Meta | Qwen3-235B Alibaba | Qwen3-32B Alibaba | GLM 4.6 Zhipu AI | GPT-4o OpenAI | Llama 4 Scout Meta | Mistral Small 3 Mistral | GPT-5 Nano OpenAI |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Tier | Premium | Premium | Mid-tier | Premium | Premium | Premium | Premium | Premium | Mid-tier | Mid-tier | Mid-tier | Cost-effective | Cost-effective | Cost-effective | Cost-effective | Cost-effective | Cost-effective | Cost-effective | Cost-effective | Cost-effective | Premium | Ultra-cheap | Cost-effective | Ultra-cheap |
| Context Window | 200K tokens | 256K tokens | 1M tokens | 1M tokens | 272K tokens | 272K tokens | 272K tokens | 1M tokens | 200K tokens | 200K tokens | 128K tokens | 128K tokens | 128K tokens | 1M tokens | 128K tokens | 272K tokens | 1M tokens | 128K tokens | 128K tokens | 128K tokens | 128K tokens | 10M tokens | 128K tokens | 272K tokens |
| Max Output | 100K tokens | 32K tokens | 8K tokens | 8K tokens | 128K tokens | 128K tokens | 128K tokens | 8K tokens | 100K tokens | 8K tokens | 32K tokens | 32K tokens | 32K tokens | 8K tokens | 32K tokens | 64K tokens | 32K tokens | 32K tokens | 32K tokens | 32K tokens | 16K tokens | 32K tokens | 32K tokens | 32K tokens |
| Input Cost | $15.00 | $0.80 | $3.00 | $15.00 | $1.25 | $1.25 | $1.25 | $1.25* | $1.10 | $1.00 | $3.00 | $0.55 | $0.40 | $0.10 | $0.56 | $0.25 | $0.50 | $0.35 | $0.20 | $0.30 | $2.50 | $0.11 | $0.20 | $0.05 |
| Output Cost | $60.00 | $3.20 | $15.00 | $75.00 | $10.00 | $10.00 | $10.00 | $10.00* | $4.40 | $5.00 | $15.00 | $2.19 | $2.00 | $0.40 | $1.68 | $2.00 | $0.77 | $0.60 | $0.40 | $0.55 | $10.00 | $0.34 | $0.60 | $0.40 |
| Cached Input | N/A | N/A | $0.30 | N/A | $0.125 | $0.125 | $0.125 | $0.31 | N/A | $0.10 | N/A | $0.14 | N/A | N/A | $0.07 | $0.025 | N/A | N/A | N/A | N/A | N/A | N/A | N/A | $0.005 |
| Multimodal | ||||||||||||||||||||||||
| Streaming | ||||||||||||||||||||||||
| Function Calling | ||||||||||||||||||||||||
| Prompt Caching | ||||||||||||||||||||||||
| Latency | Slow (reasoning) | Medium | Fast | Medium | Fast | Adaptive | Fast | Fast | Medium (reasoning) | Very Fast | Fast | Medium (reasoning) | Fast | Very Fast | Fast | Fast | Fast | Fast | Very Fast | Fast | Fast | Fast | Very Fast | Very Fast |
| Key Strengths |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| Notes | Released June 2025. Highest reasoning capability | Released Nov 6, 2025. 1T params open-source, trained for only $4.6M | Released Sep 2025. Anthropic's best coding model | Released Aug 2025. Works continuously for hours on complex tasks | Released Nov 12, 2025. Most-used model with adaptive reasoning capability | Released Nov 12, 2025. Advanced reasoning model, easier to understand | Released Aug 2025. Legacy model, replaced by GPT-5.1 | *Tiered: Released March 2025. Google's most expensive model | Released April 2025. Replaces o3-mini | Released Oct 2025. Within 5% of Sonnet at 1/3 cost | API launched April 2025. Compatible with OpenAI SDK | Released Jan 2025. Comparable to o1 at 3.6% of cost | Released Jan 2025. Best value for production workloads | Next-gen features with superior speed | Released Aug 2025. MIT license. Beats GPT-4 at 17% of cost | Released Aug 2025. Smaller, faster GPT-5 variant | Released April 2025. 17B active, 128 experts | MoE architecture, open-source under Apache 2.0 | Latest Qwen model, excellent performance-to-cost ratio | Strong Chinese/English performance, open-source | Best for majority of tasks across industries | Released April 2025. 10M = ~7,500 pages. 17B active, 109B total | Released Jan 2025. Compact and efficient | Released Aug 2025. Excels at classification and simple instructions |
| Learn More | View Details | View Details | View Details | View Details | View Details | View Details | View Details | View Details | View Details | View Details | View Details | View Details | View Details | View Details | View Details | View Details | View Details | View Details | View Details | View Details | View Details | View Details | View Details | View Details |
Quick Picks by Use Case
Best for Coding
Latest from OpenAI with adaptive reasoning. More accurate and conversational than GPT-5.
1M context for entire codebases. Best for agents and computer use.
Fast coding at 1/3 the cost of Sonnet. Great for agent tasks.
Best for Long Context
10M tokens! Process 7,500 pages. Open weights. Multimodal.
1M context with best reasoning quality and prompt caching.
1M context with thinking mode. Flash is cost-effective option.
Cheapest Options
Cheapest with 272K context. GPT-5 family efficiency.
3x faster than Llama 3.3. Very low latency.
10M context at ultra-low price. Open weights.
Multimodal, production-ready. Widely supported.
Open Source Leaders
10M context, 109B params. Meta's flagship.
400B total, 17B active. Multimodal MoE.
671B params. Ultra-cheap reasoning and chat.
Apache 2.0. 29 languages. 72.7B params.