ModelsLlama 3.2 3B

Llama 3.2 3B

by Meta

Local/Edge

Pricing

Input

Self-hosted

per 1M tokens

Output

Self-hosted

per 1M tokens

Cached

N/A

per 1M tokens

Note: Smallest Llama 3.2 for edge devices

Context & Output

Context Window128K tokens

Max Output32K tokens

Latency

Very Fast

Capabilities

Multimodal

Streaming

Function Calling

Prompt Caching

Key Strengths

What makes this model stand out

Compact

Mobile-friendly

Open weights

Similar Models in Local/Edge Tier

Other models with similar pricing and performance characteristics

IBM Granite 4.0 Nano 1.5B

IBM

Input:Self-hosted/M

Context:128K tokens

IBM Granite 4.0 Nano 350M

IBM

Input:Self-hosted/M

Context:128K tokens

Phi-4

Microsoft

Input:Self-hosted/M

Context:16K tokens