LLM Model Comparison

Compare 44 major language models across pricing, context windows, capabilities, and performance metrics.

Showing 44 of 44 models
44
Total Models
12
Companies
$0.05-$75
Price Range
10M tokens
Max Context
Kimi K2 Thinking

Moonshot AI

Premium
Input:$0.80/M
Output:$3.20/M
Context:256K tokens
Max Output:32K tokens
Latency:Medium
Streaming
Functions

Best for:

Beats GPT-5
71.3% SWE-bench
200-300 tool calls autonomy
View Full Details
Claude Opus 4.1

Anthropic

Premium
Input:$15.00/M
Output:$75.00/M
Context:1M tokens
Max Output:8K tokens
Latency:Medium
Multimodal
Streaming
Functions
Caching

Best for:

Best coding (72.5% SWE-bench)
Long-running tasks
Agent workflows
View Full Details
Gemini 2.5 Pro

Google

Premium
Input:$1.25*/M
Output:$10.00*/M
Cached: $0.31/M
Context:1M tokens
Max Output:8K tokens
Latency:Fast
Multimodal
Streaming
Functions
Caching

Best for:

#1 on LMArena
86.4 GPQA reasoning
Deep Think mode
View Full Details
GPT-5.1 Instant

OpenAI

Premium
Input:$1.25/M
Output:$10.00/M
Cached: $0.125/M
Context:272K tokens
Max Output:128K tokens
Latency:Fast
Multimodal
Streaming
Functions
Caching

Best for:

Adaptive reasoning
Warmer & conversational
More accurate
View Full Details
GPT-5.1 Thinking

OpenAI

Premium
Input:$1.25/M
Output:$10.00/M
Cached: $0.125/M
Context:272K tokens
Max Output:128K tokens
Latency:Adaptive
Multimodal
Streaming
Functions
Caching

Best for:

Advanced reasoning
Faster on simple tasks
More persistent on complex
View Full Details
GPT-5

OpenAI

Premium
Input:$1.25/M
Output:$10.00/M
Cached: $0.125/M
Context:272K tokens
Max Output:128K tokens
Latency:Fast
Multimodal
Streaming
Functions
Caching

Best for:

Software-on-demand
Multimodal
88.4% GPQA
View Full Details
GPT-5 Codex

OpenAI

Premium
Input:$1.25/M
Output:$10.00/M
Cached: $0.125/M
Context:272K tokens
Max Output:128K tokens
Latency:Adaptive
Streaming
Functions
Caching

Best for:

Agentic coding
7+ hour autonomy
Code review
View Full Details
GPT-4o

OpenAI

Premium
Input:$2.50/M
Output:$10.00/M
Context:128K tokens
Max Output:16K tokens
Latency:Fast
Multimodal
Streaming
Functions

Best for:

Flagship general-purpose
Multimodal
Versatile
View Full Details
o3-pro

OpenAI

Premium
Input:$15.00/M
Output:$60.00/M
Context:200K tokens
Max Output:100K tokens
Latency:Slow (reasoning)
Functions

Best for:

Most capable reasoning
Math/science
PhD-level tasks
View Full Details
Grok 4

xAI

Premium
Input:$3.00/M
Output:$15.00/M
Cached: $0.75/M
Context:128K tokens
Max Output:32K tokens
Latency:Fast
Multimodal
Streaming
Functions
Caching

Best for:

Real-time web search
Trained on Colossus
Multimodal
View Full Details
Mistral Large

Mistral

Premium
Input:$2.00/M
Output:$6.00/M
Context:128K tokens
Max Output:32K tokens
Latency:Fast
Streaming
Functions

Best for:

European AI leader
123B params
Strong multilingual
View Full Details
Gemini 1.5 Pro

Google

Premium
Input:$1.25/M
Output:$5.00/M
Cached: $0.31/M
Context:2M tokens
Max Output:8K tokens
Latency:Fast
Multimodal
Streaming
Functions
Caching

Best for:

2M context!
Multimodal
Long documents
View Full Details
Claude Sonnet 4.5

Anthropic

Mid-tier
Input:$3.00/M
Output:$15.00/M
Cached: $0.30/M
Context:1M tokens
Max Output:8K tokens
Latency:Fast
Multimodal
Streaming
Functions
Caching

Best for:

Best coding/agents
72.7% SWE-bench
90% cache savings
View Full Details
GPT-4.1

OpenAI

Mid-tier
Input:$2.00/M
Output:$8.00/M
Cached: $0.50/M
Context:1M tokens
Max Output:64K tokens
Latency:Fast
Multimodal
Streaming
Functions
Caching

Best for:

1M context
Long conversations
Improved reasoning
View Full Details
Grok 3

xAI

Mid-tier
Input:$3.00/M
Output:$15.00/M
Context:128K tokens
Max Output:32K tokens
Latency:Fast
Multimodal
Streaming
Functions

Best for:

Real-time search
Function calling
Fast inference
View Full Details
o4-mini

OpenAI

Mid-tier
Input:$1.10/M
Output:$4.40/M
Context:200K tokens
Max Output:100K tokens
Latency:Medium (reasoning)
Multimodal
Functions

Best for:

Fast reasoning
Best on AIME 2024/2025
Math/coding
View Full Details
Claude Haiku 4.5

Anthropic

Mid-tier
Input:$1.00/M
Output:$5.00/M
Cached: $0.10/M
Context:200K tokens
Max Output:8K tokens
Latency:Very Fast
Multimodal
Streaming
Functions
Caching

Best for:

Fastest
Near Sonnet quality
90% cache savings
View Full Details
Qwen 2.5 72B

Alibaba

Mid-tier
Input:$1.00/M
Output:$1.00/M
Context:128K tokens
Max Output:32K tokens
Latency:Fast
Streaming
Functions

Best for:

72.7B params
29 languages
Apache 2.0
View Full Details
DeepSeek V3.1

DeepSeek

Cost-effective
Input:$0.56/M
Output:$1.68/M
Cached: $0.07/M
Context:128K tokens
Max Output:32K tokens
Latency:Fast
Streaming
Functions
Caching

Best for:

Hybrid thinking/non-thinking
671B params
82.6% HumanEval
View Full Details
GPT-5 Mini

OpenAI

Cost-effective
Input:$0.25/M
Output:$2.00/M
Cached: $0.025/M
Context:272K tokens
Max Output:64K tokens
Latency:Fast
Multimodal
Streaming
Functions
Caching

Best for:

GPT-5 quality
272K context
Multimodal
View Full Details
DeepSeek R1

DeepSeek

Cost-effective
Input:$0.55/M
Output:$2.19/M
Cached: $0.14/M
Context:128K tokens
Max Output:32K tokens
Latency:Medium (reasoning)
Streaming
Caching

Best for:

Reasoning model
27x cheaper than o1
MIT license
View Full Details
Qwen3-235B

Alibaba

Cost-effective
Input:$0.35/M
Output:$0.60/M
Context:128K tokens
Max Output:32K tokens
Latency:Fast
Streaming
Functions

Best for:

235B params (22B active)
Hybrid thinking
Apache 2.0
View Full Details
Qwen3-32B

Alibaba

Cost-effective
Input:$0.20/M
Output:$0.40/M
Context:128K tokens
Max Output:32K tokens
Latency:Very Fast
Streaming
Functions

Best for:

Outperforms o1-mini
Strong reasoning
Apache 2.0
View Full Details
Mistral Medium 3

Mistral

Cost-effective
Input:$0.40/M
Output:$2.00/M
Context:128K tokens
Max Output:32K tokens
Latency:Fast
Streaming
Functions

Best for:

8x cheaper than competitors
EU hosting
Function calling
View Full Details
Mistral Small 3

Mistral

Cost-effective
Input:$0.20/M
Output:$0.60/M
Context:128K tokens
Max Output:32K tokens
Latency:Very Fast
Streaming
Functions

Best for:

24B params
Fast inference
EU compliant
View Full Details
GPT-OSS-120B

OpenAI

Cost-effective
Input:Self-hosted/M
Output:Self-hosted/M
Context:128K tokens
Max Output:32K tokens
Latency:Fast
Streaming
Functions

Best for:

Matches o4-mini
117B total (5.1B active)
Open weights
View Full Details
Llama 4 Maverick

Meta

Cost-effective
Input:$0.50/M
Output:$0.77/M
Context:1M tokens
Max Output:32K tokens
Latency:Fast
Multimodal
Streaming
Functions

Best for:

400B params MoE
Multimodal
Open weights
View Full Details
Grok 3 Mini

xAI

Cost-effective
Input:$0.30/M
Output:$0.50/M
Context:128K tokens
Max Output:16K tokens
Latency:Very Fast
Streaming
Functions

Best for:

Fast
Cost-efficient
Function calling
View Full Details
GLM 4.6

Zhipu AI

Cost-effective
Input:$0.30/M
Output:$0.55/M
Context:128K tokens
Max Output:32K tokens
Latency:Fast
Streaming
Functions

Best for:

355B MoE
Bilingual (CN/EN)
MIT license
View Full Details
Llama 3.3 70B

Meta

Cost-effective
Input:$0.35/M
Output:$0.55/M
Context:128K tokens
Max Output:32K tokens
Latency:Fast
Streaming
Functions

Best for:

Similar to 3.1 405B
Open weights
Cost-efficient
View Full Details
Gemini 2.0 Flash

Google

Cost-effective
Input:$0.10/M
Output:$0.40/M
Context:1M tokens
Max Output:8K tokens
Latency:Very Fast
Multimodal
Streaming
Functions

Best for:

1M context
Native tool use
Multimodal
View Full Details
Gemini 1.5 Flash

Google

Cost-effective
Input:$0.075/M
Output:$0.30/M
Cached: $0.01875/M
Context:1M tokens
Max Output:8K tokens
Latency:Very Fast
Multimodal
Streaming
Functions
Caching

Best for:

1M context
Fast
Prompt caching
View Full Details
Llama 4 Scout

Meta

Ultra-cheap
Input:$0.11/M
Output:$0.34/M
Context:10M tokens
Max Output:32K tokens
Latency:Fast
Multimodal
Streaming
Functions

Best for:

10M context!
Multimodal
Open weights
View Full Details
GPT-5 Nano

OpenAI

Ultra-cheap
Input:$0.05/M
Output:$0.40/M
Cached: $0.005/M
Context:272K tokens
Max Output:32K tokens
Latency:Very Fast
Multimodal
Streaming
Functions
Caching

Best for:

High throughput
Simple tasks
272K context
View Full Details
Llama 3.1 405B

Meta

Ultra-cheap
Input:$0.30/M
Output:$0.50/M
Context:128K tokens
Max Output:32K tokens
Latency:Fast
Streaming
Functions

Best for:

Largest open model
Multilingual
128K context
View Full Details
Llama 3.2 90B

Meta

Ultra-cheap
Input:$0.20/M
Output:$0.40/M
Context:128K tokens
Max Output:32K tokens
Latency:Fast
Multimodal
Streaming
Functions

Best for:

First multimodal Llama
Image+text
Open weights
View Full Details
Llama 3.1 70B

Meta

Ultra-cheap
Input:$0.08/M
Output:$0.15/M
Context:128K tokens
Max Output:32K tokens
Latency:Very Fast
Streaming
Functions

Best for:

Popular for production
Open weights
Strong reasoning
View Full Details
IBM Granite 4.0 Nano 1.5B

IBM

Local/Edge
Input:Self-hosted/M
Output:Self-hosted/M
Context:128K tokens
Max Output:32K tokens
Latency:Fast
Streaming
Functions

Best for:

Runs in browser
70% memory reduction
ISO 42001
View Full Details
IBM Granite 4.0 Nano 350M

IBM

Local/Edge
Input:Self-hosted/M
Output:Self-hosted/M
Context:128K tokens
Max Output:32K tokens
Latency:Very Fast
Streaming
Functions

Best for:

Ultra-compact
Browser-runnable
Enterprise certified
View Full Details
Phi-4

Microsoft

Local/Edge
Input:Self-hosted/M
Output:Self-hosted/M
Context:16K tokens
Max Output:4K tokens
Latency:Very Fast
Streaming
Functions

Best for:

SOTA small model
Reasoning
Runs on laptop
View Full Details
Phi-4-multimodal

Microsoft

Local/Edge
Input:Self-hosted/M
Output:Self-hosted/M
Context:16K tokens
Max Output:4K tokens
Latency:Very Fast
Multimodal
Streaming
Functions

Best for:

Text/image/audio/video
5.6B params
20+ languages
View Full Details
Florence-2

Microsoft

Local/Edge
Input:Self-hosted/M
Output:Self-hosted/M
Context:N/A
Max Output:N/A
Latency:Fast
Multimodal

Best for:

Vision foundation model
OCR
Object detection
View Full Details
Llama 3.2 3B

Meta

Local/Edge
Input:Self-hosted/M
Output:Self-hosted/M
Context:128K tokens
Max Output:32K tokens
Latency:Very Fast
Streaming
Functions

Best for:

Compact
Mobile-friendly
Open weights
View Full Details
Llama 3.2 1B

Meta

Local/Edge
Input:Self-hosted/M
Output:Self-hosted/M
Context:128K tokens
Max Output:32K tokens
Latency:Very Fast
Streaming

Best for:

Ultra-compact
Edge deployment
Open weights
View Full Details
Need Help Choosing?
Check out our detailed comparison table or explore individual model pages for in-depth specifications.