ModelsPhi-4-multimodal

Phi-4-multimodal

by Microsoft

Local/Edge

Pricing
Input
Self-hosted
per 1M tokens
Output
Self-hosted
per 1M tokens
Cached
N/A
per 1M tokens
Note: Released Feb 2025. First multimodal Phi model with Florence vision encoder
Context & Output
Context Window16K tokens
Max Output4K tokens
Latency
Very Fast
Capabilities
Multimodal
Streaming
Function Calling
Prompt Caching
Key Strengths
What makes this model stand out
Text/image/audio/video
5.6B params
20+ languages
Similar Models in Local/Edge Tier
Other models with similar pricing and performance characteristics
IBM Granite 4.0 Nano 1.5B
IBM
Input:Self-hosted/M
Context:128K tokens
View Details
IBM Granite 4.0 Nano 350M
IBM
Input:Self-hosted/M
Context:128K tokens
View Details
Phi-4
Microsoft
Input:Self-hosted/M
Context:16K tokens
View Details