Six months after GPT-5.4 dropped, OpenAI just fired back with two models built to change AI economics. GPT-5.4 Mini and Nano aren't just cheaper versions of the flagship. They're purpose-built for the "agentic economy" — AI models delegating tasks to other AI models.
On March 17, 2026, OpenAI announced models that run fast, cost reasonable money, and don't sacrifice much capability. GPT-5.4 Mini runs over 2x faster than its predecessor and scores 54.4% on SWE-Bench Pro — just 3.3% below the flagship GPT-5.4. Nano costs only €0.20 per million tokens and was built for the grunt work nobody wants: classification, extraction, ranking.📖 Read more: Gemini 2.0 Flash: 2x Faster Than Pro with Native Multimodal
🚀 GPT-5.4 Mini: The Model That Didn't Compromise
Let's start with the numbers that actually matter. GPT-5.4 Mini isn't GPT-5 Mini with better branding. It's a completely new architectural approach targeting that sweet spot where speed meets capability. In coding, Mini scores 54.4% on SWE-Bench Pro. For reference, GPT-5 Mini hit 45.7% and the flagship GPT-5.4 reaches 57.7%. Mini operates at 94% of the flagship's coding capability, but costs and runs fast enough for production use.🎯 Benchmark Performance That Counts
SWE-Bench Pro (Coding): 54.4% vs 45.7% for GPT-5 Mini
OSWorld-Verified (Computer Use): 72.1% vs 42.0% for GPT-5 Mini
GPQA Diamond (Reasoning): 88% vs 81.6% for GPT-5 Mini
Toolathlon (Tool Use): 42.9% vs 26.9% for GPT-5 Mini
Where Mini Drops Points
Not everything is perfect. In long-context needle retrieval tasks, Mini scores 47.7% versus the flagship GPT-5.4's 86%. If your work involves finding specific facts in very long documents, Mini will disappoint you. It's one of the few cases where the gap with the flagship is dramatic — and worth knowing before you migrate workloads.💰 GPT-5.4 Nano: The €0.20 Model
Nano is something completely different. It's not "cheap Mini" — it's a model designed from scratch for subagent workflows and bulk processing tasks. At €0.20 per million input tokens (with cached input at €0.02), Nano changes the economics of agentic pipelines. Simon Willison ran the math: describing 76,000 photos costs about €49. For media companies or e-commerce platforms, that's workable production cost.52.4% SWE-Bench Pro Score
€0.18 Per Million Tokens
39.0% OSWorld Computer Use
What Nano Can't Do
In OSWorld-Verified, Nano scores just 39%, slightly below even GPT-5 Mini. For computer-use tasks, Nano is off-limits. That's the basic difference that will determine when you use Mini versus Nano.🔧 The Architecture of the Multi-Model Era
The real innovation isn't the models per se — it's the workflow pattern OpenAI is proposing. Orchestrator models (GPT-5.4) handle planning and complex decisions. Execution models (GPT-5.4 Mini) run mid-complexity subtasks with speed. Worker models (GPT-5.4 Nano) do the "dirty work" — classification, extraction, ranking.This architecture solves a fundamental economics problem the AI industry had. Expensive models were too costly for subagent frequency. Cheap models were too weak to be reliable. Mini and Nano together close that gap."Larger models handle planning and judgment, while GPT-5.4 Mini executes narrower subtasks"
OpenAI Documentation
🎮 What Changes for Developers
The immediate impact will hit coding assistant developers first. Mini scores 42.9% on the Toolathlon benchmark (tool-calling accuracy) versus GPT-5 Mini's 26.9% — a 60% relative improvement. In agentic pipelines, tool-calling errors compound exponentially. One wrong function call can destroy an entire workflow. In the Codex environment, Mini uses only 30% of GPT-5.4's quota. That means roughly three times more tasks with the same budget allocation — a breakthrough for teams running automated code reviews or pull request generation.Free Tier Users: The Instant Upgrade
Starting March 17, free ChatGPT users get access to GPT-5.4 Mini through "Thinking" mode. This means they're getting a model that outperforms GPT-5 Mini on all major benchmarks — without changing their subscription cost.For Coding
54.4% SWE-Bench Pro score — real upgrade for coding assistants and automated debugging
For Computer Use
72.1% OSWorld score — very close to flagship for UI automation and screenshot interpretation
For Reasoning
88% GPQA Diamond — doctoral-level reasoning at a price that makes sense for production
