This is our first general-availability realtime model, capable of responding to audio and text inputs in realtime over WebRTC, WebSocket, or SIP connections.
Specifications
Context
32K
Maximum Output
4.1K
Inputtext, audio, image
Outputtext, audio
Performance (7-day Average)
Collecting…
Collecting…
Collecting…
Pricing
Input$4.40/MTokens
Cached Input$0.55/MTokens
Output$17.60/MTokens
Input Audio$35.20/MTokens
cached input audio$0.55/MTokens
Output Audio$70.40/MTokens
Input Image$5.50/MTokens
Availability Trend (24h)
Performance Metrics (24h)
Similar Models
$3.30/$4.40/M
ctx16Kmax4Kavail—tps—
InOutCap
GPT-3.5 Turbo variant with extended 16K token context window for longer conversations and documents.
$4.40/$17.60/M
ctx32Kmax4Kavail—tps—
InOut
This is our first general-availability realtime model, capable of responding to audio and text inputs in realtime over WebRTC, WebSocket, or SIP connections.
$2.20/$2.20/M
ctx16Kmax4Kavail—tps—
InOut
Base model for fine-tuning and legacy applications, replacing the original davinci base model.
$2.20/$8.80/M
ctx1.0Mmax33Kavail—tps—
InOutCap
GPT-4.1 is an enhanced version of GPT-4 with improved instruction following and multimodal capabilities for text and image understanding.