Phi-4

Local/Edge

Pricing

Input

Self-hosted

per 1M tokens

Output

Self-hosted

per 1M tokens

Cached

N/A

per 1M tokens

Note: Latest Phi release, optimized for edge deployment

Context & Output

Context Window16K tokens

Max Output4K tokens

Latency

Very Fast

Capabilities

Multimodal

Streaming

Function Calling

Prompt Caching

Key Strengths

What makes this model stand out

SOTA small model

Reasoning

Runs on laptop

Similar Models in Local/Edge Tier

Other models with similar pricing and performance characteristics

IBM Granite 4.0 Nano 1.5B

IBM

Input:Self-hosted/M

Context:128K tokens

IBM Granite 4.0 Nano 350M

IBM

Input:Self-hosted/M

Context:128K tokens

Phi-4-multimodal

Microsoft

Input:Self-hosted/M

Context:16K tokens