Comprehensive LLM Comparison

Experimental

Compare 25 major LLMs including GPT-5.1, Kimi K2, Llama 4, Mistral, Claude, Gemini, DeepSeek, Qwen3, and GLM 4.6. From $0.05 to $75 per million tokens.

Updated November 12, 2025 with GPT-5.1 Instant & Thinking, plus latest models from Moonshot AI, Meta, Anthropic, Google, Mistral, DeepSeek, Alibaba, and Zhipu AI

Feature	o3-pro OpenAI	Kimi K2 Thinking Moonshot AI	Claude Sonnet 4.5 Anthropic	Claude Opus 4.1 Anthropic	GPT-5.1 Instant OpenAI	GPT-5.1 Thinking OpenAI	GPT-5 OpenAI	Gemini 2.5 Pro Google	o4-mini OpenAI	Claude Haiku 4.5 Anthropic	Grok 3 xAI	DeepSeek R1 DeepSeek	Mistral Medium 3 Mistral	Gemini 2.0 Flash Google	DeepSeek V3.1 DeepSeek	GPT-5 Mini OpenAI	Llama 4 Maverick Meta	Qwen3-235B Alibaba	Qwen3-32B Alibaba	GLM 4.6 Zhipu AI	GPT-4o OpenAI	Llama 4 Scout Meta	Mistral Small 3 Mistral	GPT-5 Nano OpenAI
Tier	Premium	Premium	Mid-tier	Premium	Premium	Premium	Premium	Premium	Mid-tier	Mid-tier	Mid-tier	Cost-effective	Cost-effective	Cost-effective	Cost-effective	Cost-effective	Cost-effective	Cost-effective	Cost-effective	Cost-effective	Premium	Ultra-cheap	Cost-effective	Ultra-cheap
Context Window	200K tokens	256K tokens	1M tokens	1M tokens	272K tokens	272K tokens	272K tokens	1M tokens	200K tokens	200K tokens	128K tokens	128K tokens	128K tokens	1M tokens	128K tokens	272K tokens	1M tokens	128K tokens	128K tokens	128K tokens	128K tokens	10M tokens	128K tokens	272K tokens
Max Output	100K tokens	32K tokens	8K tokens	8K tokens	128K tokens	128K tokens	128K tokens	8K tokens	100K tokens	8K tokens	32K tokens	32K tokens	32K tokens	8K tokens	32K tokens	64K tokens	32K tokens	32K tokens	32K tokens	32K tokens	16K tokens	32K tokens	32K tokens	32K tokens
Input Cost	$15.00	$0.80	$3.00	$15.00	$1.25	$1.25	$1.25	$1.25*	$1.10	$1.00	$3.00	$0.55	$0.40	$0.10	$0.56	$0.25	$0.50	$0.35	$0.20	$0.30	$2.50	$0.11	$0.20	$0.05
Output Cost	$60.00	$3.20	$15.00	$75.00	$10.00	$10.00	$10.00	$10.00*	$4.40	$5.00	$15.00	$2.19	$2.00	$0.40	$1.68	$2.00	$0.77	$0.60	$0.40	$0.55	$10.00	$0.34	$0.60	$0.40
Cached Input	N/A	N/A	$0.30	N/A	$0.125	$0.125	$0.125	$0.31	N/A	$0.10	N/A	$0.14	N/A	N/A	$0.07	$0.025	N/A	N/A	N/A	N/A	N/A	N/A	N/A	$0.005
Multimodal
Streaming
Function Calling
Prompt Caching
Latency	Slow (reasoning)	Medium	Fast	Medium	Fast	Adaptive	Fast	Fast	Medium (reasoning)	Very Fast	Fast	Medium (reasoning)	Fast	Very Fast	Fast	Fast	Fast	Fast	Very Fast	Fast	Fast	Fast	Very Fast	Very Fast
Key Strengths	• Most capable reasoning • Math/science • PhD-level tasks	• Beats GPT-5 • 71.3% SWE-bench • 200-300 tool calls autonomy	• Best coding/agents • 72.7% SWE-bench • 90% cache savings	• Best coding (72.5% SWE-bench) • Long-running tasks • Agent workflows	• Adaptive reasoning • Warmer & conversational • More accurate	• Advanced reasoning • Faster on simple tasks • More persistent on complex	• Software-on-demand • Multimodal • 88.4% GPQA	• #1 on LMArena • 86.4 GPQA reasoning • Deep Think mode	• Fast reasoning • Best on AIME 2024/2025 • Math/coding	• Fastest • Near Sonnet quality • 90% cache savings	• Real-time search • Function calling • Fast inference	• Reasoning model • 27x cheaper than o1 • MIT license	• 8x cheaper than competitors • EU hosting • Function calling	• 1M context • Native tool use • Multimodal	• Hybrid thinking/non-thinking • 671B params • 82.6% HumanEval	• GPT-5 quality • 272K context • Multimodal	• 400B params MoE • Multimodal • Open weights	• 235B params (22B active) • Hybrid thinking • Apache 2.0	• Outperforms o1-mini • Strong reasoning • Apache 2.0	• 355B MoE • Bilingual (CN/EN) • MIT license	• Flagship general-purpose • Multimodal • Versatile	• 10M context! • Multimodal • Open weights	• 24B params • Fast inference • EU compliant	• High throughput • Simple tasks • 272K context
Notes	Released June 2025. Highest reasoning capability	Released Nov 6, 2025. 1T params open-source, trained for only $4.6M	Released Sep 2025. Anthropic's best coding model	Released Aug 2025. Works continuously for hours on complex tasks	Released Nov 12, 2025. Most-used model with adaptive reasoning capability	Released Nov 12, 2025. Advanced reasoning model, easier to understand	Released Aug 2025. Legacy model, replaced by GPT-5.1	*Tiered: Released March 2025. Google's most expensive model	Released April 2025. Replaces o3-mini	Released Oct 2025. Within 5% of Sonnet at 1/3 cost	API launched April 2025. Compatible with OpenAI SDK	Released Jan 2025. Comparable to o1 at 3.6% of cost	Released Jan 2025. Best value for production workloads	Next-gen features with superior speed	Released Aug 2025. MIT license. Beats GPT-4 at 17% of cost	Released Aug 2025. Smaller, faster GPT-5 variant	Released April 2025. 17B active, 128 experts	MoE architecture, open-source under Apache 2.0	Latest Qwen model, excellent performance-to-cost ratio	Strong Chinese/English performance, open-source	Best for majority of tasks across industries	Released April 2025. 10M = ~7,500 pages. 17B active, 109B total	Released Jan 2025. Compact and efficient	Released Aug 2025. Excels at classification and simple instructions
Learn More	View Details	View Details	View Details	View Details	View Details	View Details	View Details	View Details	View Details	View Details	View Details	View Details	View Details	View Details	View Details	View Details	View Details	View Details	View Details	View Details	View Details	View Details	View Details	View Details

Quick Picks by Use Case

Best for Coding

GPT-5.1 Instant ($1.25/$10)

Latest from OpenAI with adaptive reasoning. More accurate and conversational than GPT-5.

Claude Sonnet 4.5 ($3/$15)

1M context for entire codebases. Best for agents and computer use.

Claude Haiku 4.5 ($1/$5)

Fast coding at 1/3 the cost of Sonnet. Great for agent tasks.

Best for Long Context

Llama 4 Scout ($0.15/$0.50) 🏆

10M tokens! Process 7,500 pages. Open weights. Multimodal.

Claude Sonnet 4.5 ($3/$15)

1M context with best reasoning quality and prompt caching.

Gemini 2.5 Pro/Flash ($1.25/$10)

1M context with thinking mode. Flash is cost-effective option.

Cheapest Options

GPT-5 nano ($0.05/$0.40)

Cheapest with 272K context. GPT-5 family efficiency.

Mistral Small 3 ($0.10/$0.30)

3x faster than Llama 3.3. Very low latency.

Llama 4 Scout ($0.15/$0.50)

10M context at ultra-low price. Open weights.

GPT-4o mini ($0.15/$0.60)

Multimodal, production-ready. Widely supported.

Open Source Leaders

Llama 4 Scout

10M context, 109B params. Meta's flagship.

Llama 4 Maverick

400B total, 17B active. Multimodal MoE.

DeepSeek V3/R1

671B params. Ultra-cheap reasoning and chat.

Qwen 2.5 72B

Apache 2.0. 29 languages. 72.7B params.

Need Help Choosing?

Try our interactive use case matcher to find the best model for your specific needs

Find Your Perfect Model