Comprehensive LLM Comparison
Compare 32 major LLMs including Claude Opus 4.5, Gemini 3 Pro, GPT-5.2, GPT-5.1, Grok 4.1, Kimi K2, Llama 4, Mistral, Claude, DeepSeek, Qwen3, and GLM 4.6. From $0.05 to $75 per million tokens.
Updated December 19, 2025 with GPT-5.2 Pro, Thinking, Instant & Codex, plus Claude Opus 4.5, Gemini 3 Pro, GPT-5.1 Instant & Thinking, and latest models from Moonshot AI, Meta, Anthropic, Google, Mistral, DeepSeek, Alibaba, and Zhipu AI
| Feature | Gemini 3 Pro Google | o3-pro OpenAI | Kimi K2 Thinking Moonshot AI | Claude Opus 4.5 Anthropic | Claude Sonnet 4.5 Anthropic | Claude Opus 4.1 Anthropic | GPT-5.2 Pro OpenAI | GPT-5.2 Thinking OpenAI | GPT-5.2 Instant OpenAI | GPT-5.2 Codex OpenAI | GPT-5.1 Instant OpenAI | GPT-5.1 Thinking OpenAI | GPT-5 OpenAI | Gemini 2.5 Pro Google | Grok 4.1 xAI | Grok 4 xAI | o4-mini OpenAI | Claude Haiku 4.5 Anthropic | Grok 3 xAI | DeepSeek R1 DeepSeek | Mistral Medium 3 Mistral | Gemini 2.0 Flash Google | DeepSeek V3.1 DeepSeek | GPT-5 Mini OpenAI | Llama 4 Maverick Meta | Qwen3-235B Alibaba | Qwen3-32B Alibaba | GLM 4.6 Zhipu AI | GPT-4o OpenAI | Llama 4 Scout Meta | Mistral Small 3 Mistral | GPT-5 Nano OpenAI |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Tier | Premium | Premium | Premium | Premium | Mid-tier | Premium | Premium | Premium | Premium | Premium | Premium | Premium | Premium | Premium | Premium | Premium | Mid-tier | Mid-tier | Mid-tier | Cost-effective | Cost-effective | Cost-effective | Cost-effective | Cost-effective | Cost-effective | Cost-effective | Cost-effective | Cost-effective | Premium | Ultra-cheap | Cost-effective | Ultra-cheap |
| Context Window | 1M tokens | 200K tokens | 256K tokens | 1M tokens | 1M tokens | 1M tokens | 400K tokens | 400K tokens | 400K tokens | 400K tokens | 272K tokens | 272K tokens | 272K tokens | 1M tokens | 2M tokens | 128K tokens | 200K tokens | 200K tokens | 128K tokens | 128K tokens | 128K tokens | 1M tokens | 128K tokens | 272K tokens | 1M tokens | 128K tokens | 128K tokens | 128K tokens | 128K tokens | 10M tokens | 128K tokens | 272K tokens |
| Max Output | 64K tokens | 100K tokens | 32K tokens | 8K tokens | 8K tokens | 8K tokens | 128K tokens | 128K tokens | 128K tokens | 128K tokens | 128K tokens | 128K tokens | 128K tokens | 8K tokens | 32K tokens | 32K tokens | 100K tokens | 8K tokens | 32K tokens | 32K tokens | 32K tokens | 8K tokens | 32K tokens | 64K tokens | 32K tokens | 32K tokens | 32K tokens | 32K tokens | 16K tokens | 32K tokens | 32K tokens | 32K tokens |
| Input Cost | $2.00* | $15.00 | $0.80 | $5.00 | $3.00 | $15.00 | $21.00 | $1.75 | $1.75 | $1.75 | $1.25 | $1.25 | $1.25 | $1.25* | $0.20 | $3.00 | $1.10 | $1.00 | $3.00 | $0.55 | $0.40 | $0.10 | $0.56 | $0.25 | $0.50 | $0.35 | $0.20 | $0.30 | $2.50 | $0.11 | $0.20 | $0.05 |
| Output Cost | $12.00* | $60.00 | $3.20 | $25.00 | $15.00 | $75.00 | $168.00 | $14.00 | $14.00 | $14.00 | $10.00 | $10.00 | $10.00 | $10.00* | $0.50 | $15.00 | $4.40 | $5.00 | $15.00 | $2.19 | $2.00 | $0.40 | $1.68 | $2.00 | $0.77 | $0.60 | $0.40 | $0.55 | $10.00 | $0.34 | $0.60 | $0.40 |
| Cached Input | N/A | N/A | N/A | $0.50 | $0.30 | N/A | $2.10 | $0.175 | $0.175 | $0.175 | $0.125 | $0.125 | $0.125 | $0.31 | $0.05 | $0.75 | N/A | $0.10 | N/A | $0.14 | N/A | N/A | $0.07 | $0.025 | N/A | N/A | N/A | N/A | N/A | N/A | N/A | $0.005 |
| Multimodal | ||||||||||||||||||||||||||||||||
| Streaming | ||||||||||||||||||||||||||||||||
| Function Calling | ||||||||||||||||||||||||||||||||
| Prompt Caching | ||||||||||||||||||||||||||||||||
| Latency | Fast | Slow (reasoning) | Medium | Medium | Fast | Medium | Medium | Medium (reasoning) | Fast | Adaptive | Fast | Adaptive | Fast | Fast | Fast | Fast | Medium (reasoning) | Very Fast | Fast | Medium (reasoning) | Fast | Very Fast | Fast | Fast | Fast | Fast | Very Fast | Fast | Fast | Fast | Very Fast | Very Fast |
| Key Strengths |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| Notes | *Tiered: Released Nov 18, 2025. Google's most intelligent model with agentic capabilities | Released June 2025. Highest reasoning capability | Released Nov 6, 2025. 1T params open-source, trained for only $4.6M | Released Nov 24, 2025. Best-in-class for coding, agents, and autonomous tasks | Released Sep 2025. Anthropic's best coding model | Released Aug 2025. Works continuously for hours on complex tasks | Released Dec 11, 2025. Most capable for professional tasks across 44 occupations | Released Dec 11, 2025. 40% higher than GPT-5.1, advanced reasoning with 400K context | Released Dec 11, 2025. Fast workhorse for info-seeking, technical writing, translation | Released Dec 18, 2025. Advanced agentic coding for professional software engineering and defensive cybersecurity | Released Nov 12, 2025. Most-used model with adaptive reasoning capability | Released Nov 12, 2025. Advanced reasoning model, easier to understand | Released Aug 2025. Legacy model, replaced by GPT-5.1 | *Tiered: Released March 2025. Google's most expensive model | Released Nov 2025. 3x less likely to hallucinate. #1 in LMArena Text Arena (Thinking variant). 2M token context with consistent performance. | Released Aug 2025. Knowledge cutoff: Nov 2024 | Released April 2025. Replaces o3-mini | Released Oct 2025. Within 5% of Sonnet at 1/3 cost | API launched April 2025. Compatible with OpenAI SDK | Released Jan 2025. Comparable to o1 at 3.6% of cost | Released Jan 2025. Best value for production workloads | Next-gen features with superior speed | Released Aug 2025. MIT license. Beats GPT-4 at 17% of cost | Released Aug 2025. Smaller, faster GPT-5 variant | Released April 2025. 17B active, 128 experts | MoE architecture, open-source under Apache 2.0 | Latest Qwen model, excellent performance-to-cost ratio | Strong Chinese/English performance, open-source | Best for majority of tasks across industries | Released April 2025. 10M = ~7,500 pages. 17B active, 109B total | Released Jan 2025. Compact and efficient | Released Aug 2025. Excels at classification and simple instructions |
| Learn More | View Details | View Details | View Details | View Details | View Details | View Details | View Details | View Details | View Details | View Details | View Details | View Details | View Details | View Details | View Details | View Details | View Details | View Details | View Details | View Details | View Details | View Details | View Details | View Details | View Details | View Details | View Details | View Details | View Details | View Details | View Details | View Details |
Quick Picks by Use Case
Best for Coding
Tops SWE-bench Verified. 10.6% better than Sonnet. Best for agents & autonomous tasks.
#1 on WebDev Arena. 76.2% SWE-bench Verified. 64K output with Gemini Agent.
Latest from OpenAI with adaptive reasoning. More accurate and conversational than GPT-5.
1M context for entire codebases. Best for agents and computer use.
Best for Long Context
10M tokens! Process 7,500 pages. Open weights. Multimodal.
1M context with best reasoning quality and prompt caching.
1M context with thinking mode. Flash is cost-effective option.
Cheapest Options
Cheapest with 272K context. GPT-5 family efficiency.
3x faster than Llama 3.3. Very low latency.
10M context at ultra-low price. Open weights.
Multimodal, production-ready. Widely supported.
Open Source Leaders
10M context, 109B params. Meta's flagship.
400B total, 17B active. Multimodal MoE.
671B params. Ultra-cheap reasoning and chat.
Apache 2.0. 29 languages. 72.7B params.