LLM Model Comparison

Compare 44 major language models across pricing, context windows, capabilities, and performance metrics.

Showing 44 of 44 models

Total Models

Companies

$0.05-$75

Price Range

10M tokens

Max Context

Kimi K2 Thinking

Moonshot AI

Premium

Input:$0.80/M

Output:$3.20/M

Context:256K tokens

Max Output:32K tokens

Latency:Medium

Streaming

Functions

Best for:

Beats GPT-5

71.3% SWE-bench

200-300 tool calls autonomy

View Full Details

Claude Opus 4.1

Anthropic

Premium

Input:$15.00/M

Output:$75.00/M

Context:1M tokens

Max Output:8K tokens

Latency:Medium

Multimodal

Streaming

Functions

Caching

Best for:

Best coding (72.5% SWE-bench)

Long-running tasks

Agent workflows

View Full Details

Gemini 2.5 Pro

Google

Premium

Input:$1.25*/M

Output:$10.00*/M

Cached: $0.31/M

Context:1M tokens

Max Output:8K tokens

Latency:Fast

Multimodal

Streaming

Functions

Caching

Best for:

#1 on LMArena

86.4 GPQA reasoning

Deep Think mode

View Full Details

GPT-5.1 Instant

OpenAI

Premium

Input:$1.25/M

Output:$10.00/M

Cached: $0.125/M

Context:272K tokens

Max Output:128K tokens

Latency:Fast

Multimodal

Streaming

Functions

Caching

Best for:

Adaptive reasoning

Warmer & conversational

More accurate

View Full Details

GPT-5.1 Thinking

OpenAI

Premium

Input:$1.25/M

Output:$10.00/M

Cached: $0.125/M

Context:272K tokens

Max Output:128K tokens

Latency:Adaptive

Multimodal

Streaming

Functions

Caching

Best for:

Advanced reasoning

Faster on simple tasks

More persistent on complex

View Full Details

GPT-5

OpenAI

Premium

Input:$1.25/M

Output:$10.00/M

Cached: $0.125/M

Context:272K tokens

Max Output:128K tokens

Latency:Fast

Multimodal

Streaming

Functions

Caching

Best for:

Software-on-demand

Multimodal

88.4% GPQA

View Full Details

GPT-5 Codex

OpenAI

Premium

Input:$1.25/M

Output:$10.00/M

Cached: $0.125/M

Context:272K tokens

Max Output:128K tokens

Latency:Adaptive

Streaming

Functions

Caching

Best for:

Agentic coding

7+ hour autonomy

Code review

View Full Details

GPT-4o

OpenAI

Premium

Input:$2.50/M

Output:$10.00/M

Context:128K tokens

Max Output:16K tokens

Latency:Fast

Multimodal

Streaming

Functions

Best for:

Flagship general-purpose

Multimodal

Versatile

View Full Details

o3-pro

OpenAI

Premium

Input:$15.00/M

Output:$60.00/M

Context:200K tokens

Max Output:100K tokens

Latency:Slow (reasoning)

Functions

Best for:

Most capable reasoning

Math/science

PhD-level tasks

View Full Details

Grok 4

xAI

Premium

Input:$3.00/M

Output:$15.00/M

Cached: $0.75/M

Context:128K tokens

Max Output:32K tokens

Latency:Fast

Multimodal

Streaming

Functions

Caching

Best for:

Real-time web search

Trained on Colossus

Multimodal

View Full Details

Mistral Large

Mistral

Premium

Input:$2.00/M

Output:$6.00/M

Context:128K tokens

Max Output:32K tokens

Latency:Fast

Streaming

Functions

Best for:

European AI leader

123B params

Strong multilingual

View Full Details

Gemini 1.5 Pro

Google

Premium

Input:$1.25/M

Output:$5.00/M

Cached: $0.31/M

Context:2M tokens

Max Output:8K tokens

Latency:Fast

Multimodal

Streaming

Functions

Caching

Best for:

2M context!

Multimodal

Long documents

View Full Details

Claude Sonnet 4.5

Anthropic

Mid-tier

Input:$3.00/M

Output:$15.00/M

Cached: $0.30/M

Context:1M tokens

Max Output:8K tokens

Latency:Fast

Multimodal

Streaming

Functions

Caching

Best for:

Best coding/agents

72.7% SWE-bench

90% cache savings

View Full Details

GPT-4.1

OpenAI

Mid-tier

Input:$2.00/M

Output:$8.00/M

Cached: $0.50/M

Context:1M tokens

Max Output:64K tokens

Latency:Fast

Multimodal

Streaming

Functions

Caching

Best for:

1M context

Long conversations

Improved reasoning

View Full Details

Grok 3

xAI

Mid-tier

Input:$3.00/M

Output:$15.00/M

Context:128K tokens

Max Output:32K tokens

Latency:Fast

Multimodal

Streaming

Functions

Best for:

Real-time search

Function calling

Fast inference

View Full Details

o4-mini

OpenAI

Mid-tier

Input:$1.10/M

Output:$4.40/M

Context:200K tokens

Max Output:100K tokens

Latency:Medium (reasoning)

Multimodal

Functions

Best for:

Fast reasoning

Best on AIME 2024/2025

Math/coding

View Full Details

Claude Haiku 4.5

Anthropic

Mid-tier

Input:$1.00/M

Output:$5.00/M

Cached: $0.10/M

Context:200K tokens

Max Output:8K tokens

Latency:Very Fast

Multimodal

Streaming

Functions

Caching

Best for:

Fastest

Near Sonnet quality

90% cache savings

View Full Details

Qwen 2.5 72B

Alibaba

Mid-tier

Input:$1.00/M

Output:$1.00/M

Context:128K tokens

Max Output:32K tokens

Latency:Fast

Streaming

Functions

Best for:

72.7B params

29 languages

Apache 2.0

View Full Details

DeepSeek V3.1

DeepSeek

Cost-effective

Input:$0.56/M

Output:$1.68/M

Cached: $0.07/M

Context:128K tokens

Max Output:32K tokens

Latency:Fast

Streaming

Functions

Caching

Best for:

Hybrid thinking/non-thinking

671B params

82.6% HumanEval

View Full Details

GPT-5 Mini

OpenAI

Cost-effective

Input:$0.25/M

Output:$2.00/M

Cached: $0.025/M

Context:272K tokens

Max Output:64K tokens

Latency:Fast

Multimodal

Streaming

Functions

Caching

Best for:

GPT-5 quality

272K context

Multimodal

View Full Details

DeepSeek R1

DeepSeek

Cost-effective

Input:$0.55/M

Output:$2.19/M

Cached: $0.14/M

Context:128K tokens

Max Output:32K tokens

Latency:Medium (reasoning)

Streaming

Caching

Best for:

Reasoning model

27x cheaper than o1

MIT license

View Full Details

Qwen3-235B

Alibaba

Cost-effective

Input:$0.35/M

Output:$0.60/M

Context:128K tokens

Max Output:32K tokens

Latency:Fast

Streaming

Functions

Best for:

235B params (22B active)

Hybrid thinking

Apache 2.0

View Full Details

Qwen3-32B

Alibaba

Cost-effective

Input:$0.20/M

Output:$0.40/M

Context:128K tokens

Max Output:32K tokens

Latency:Very Fast

Streaming

Functions

Best for:

Outperforms o1-mini

Strong reasoning

Apache 2.0

View Full Details

Mistral Medium 3

Mistral

Cost-effective

Input:$0.40/M

Output:$2.00/M

Context:128K tokens

Max Output:32K tokens

Latency:Fast

Streaming

Functions

Best for:

8x cheaper than competitors

EU hosting

Function calling

View Full Details

Mistral Small 3

Mistral

Cost-effective

Input:$0.20/M

Output:$0.60/M

Context:128K tokens

Max Output:32K tokens

Latency:Very Fast

Streaming

Functions

Best for:

24B params

Fast inference

EU compliant

View Full Details

GPT-OSS-120B

OpenAI

Cost-effective

Input:Self-hosted/M

Output:Self-hosted/M

Context:128K tokens

Max Output:32K tokens

Latency:Fast

Streaming

Functions

Best for:

Matches o4-mini

117B total (5.1B active)

Open weights

View Full Details

Llama 4 Maverick

Meta

Cost-effective

Input:$0.50/M

Output:$0.77/M

Context:1M tokens

Max Output:32K tokens

Latency:Fast

Multimodal

Streaming

Functions

Best for:

400B params MoE

Multimodal

Open weights

View Full Details

Grok 3 Mini

xAI

Cost-effective

Input:$0.30/M

Output:$0.50/M

Context:128K tokens

Max Output:16K tokens

Latency:Very Fast

Streaming

Functions

Best for:

Fast

Cost-efficient

Function calling

View Full Details

GLM 4.6

Zhipu AI

Cost-effective

Input:$0.30/M

Output:$0.55/M

Context:128K tokens

Max Output:32K tokens

Latency:Fast

Streaming

Functions

Best for:

355B MoE

Bilingual (CN/EN)

MIT license

View Full Details

Llama 3.3 70B

Meta

Cost-effective

Input:$0.35/M

Output:$0.55/M

Context:128K tokens

Max Output:32K tokens

Latency:Fast

Streaming

Functions

Best for:

Similar to 3.1 405B

Open weights

Cost-efficient

View Full Details

Gemini 2.0 Flash

Google

Cost-effective

Input:$0.10/M

Output:$0.40/M

Context:1M tokens

Max Output:8K tokens

Latency:Very Fast

Multimodal

Streaming

Functions

Best for:

1M context

Native tool use

Multimodal

View Full Details

Gemini 1.5 Flash

Google

Cost-effective

Input:$0.075/M

Output:$0.30/M

Cached: $0.01875/M

Context:1M tokens

Max Output:8K tokens

Latency:Very Fast

Multimodal

Streaming

Functions

Caching

Best for:

1M context

Fast

Prompt caching

View Full Details

Llama 4 Scout

Meta

Ultra-cheap

Input:$0.11/M

Output:$0.34/M

Context:10M tokens

Max Output:32K tokens

Latency:Fast

Multimodal

Streaming

Functions

Best for:

10M context!

Multimodal

Open weights

View Full Details

GPT-5 Nano

OpenAI

Ultra-cheap

Input:$0.05/M

Output:$0.40/M

Cached: $0.005/M

Context:272K tokens

Max Output:32K tokens

Latency:Very Fast

Multimodal

Streaming

Functions

Caching

Best for:

High throughput

Simple tasks

272K context

View Full Details

Llama 3.1 405B

Meta

Ultra-cheap

Input:$0.30/M

Output:$0.50/M

Context:128K tokens

Max Output:32K tokens

Latency:Fast

Streaming

Functions

Best for:

Largest open model

Multilingual

128K context

View Full Details

Llama 3.2 90B

Meta

Ultra-cheap

Input:$0.20/M

Output:$0.40/M

Context:128K tokens

Max Output:32K tokens

Latency:Fast

Multimodal

Streaming

Functions

Best for:

First multimodal Llama

Image+text

Open weights

View Full Details

Llama 3.1 70B

Meta

Ultra-cheap

Input:$0.08/M

Output:$0.15/M

Context:128K tokens

Max Output:32K tokens

Latency:Very Fast

Streaming

Functions

Best for:

Popular for production

Open weights

Strong reasoning

View Full Details

IBM Granite 4.0 Nano 1.5B

IBM

Local/Edge

Input:Self-hosted/M

Output:Self-hosted/M

Context:128K tokens

Max Output:32K tokens

Latency:Fast

Streaming

Functions

Best for:

Runs in browser

70% memory reduction

ISO 42001

View Full Details

IBM Granite 4.0 Nano 350M

IBM

Local/Edge

Input:Self-hosted/M

Output:Self-hosted/M

Context:128K tokens

Max Output:32K tokens

Latency:Very Fast

Streaming

Functions

Best for:

Ultra-compact

Browser-runnable

Enterprise certified

View Full Details

Phi-4

Microsoft

Local/Edge

Input:Self-hosted/M

Output:Self-hosted/M

Context:16K tokens

Max Output:4K tokens

Latency:Very Fast

Streaming

Functions

Best for:

SOTA small model

Reasoning

Runs on laptop

View Full Details

Phi-4-multimodal

Microsoft

Local/Edge

Input:Self-hosted/M

Output:Self-hosted/M

Context:16K tokens

Max Output:4K tokens

Latency:Very Fast

Multimodal

Streaming

Functions

Best for:

Text/image/audio/video

5.6B params

20+ languages

View Full Details

Florence-2

Microsoft

Local/Edge

Input:Self-hosted/M

Output:Self-hosted/M

Context:N/A

Max Output:N/A

Latency:Fast

Multimodal

Best for:

Vision foundation model

OCR

Object detection

View Full Details

Llama 3.2 3B

Meta

Local/Edge

Input:Self-hosted/M

Output:Self-hosted/M

Context:128K tokens

Max Output:32K tokens

Latency:Very Fast

Streaming

Functions

Best for:

Compact

Mobile-friendly

Open weights

View Full Details

Llama 3.2 1B

Meta

Local/Edge

Input:Self-hosted/M

Output:Self-hosted/M

Context:128K tokens

Max Output:32K tokens

Latency:Very Fast

Streaming

Best for:

Ultra-compact

Edge deployment

Open weights

View Full Details

Need Help Choosing?

Check out our detailed comparison table or explore individual model pages for in-depth specifications.

Compare All Models Browse AI Tools