GPT-5.4 Launched: What’s New, What Changed for ChatGPT Users, and Why This Release Matters

Mar 5

OpenAI just shipped GPT-5.4, positioning it as its “most capable and efficient frontier model for professional work,” rolling out across ChatGPT, Codex, and the API.

This isn’t a “small model refresh.” GPT-5.4 is a consolidation release: it combines the “frontier coding” strengths of GPT-5.3-Codex with stronger reasoning, tool use, long-horizon reliability, and (most importantly) native computer-use capabilities for agents.

1) What exactly launched (and where you’ll see it)

Inside ChatGPT: “GPT-5.4 Thinking” (and “GPT-5.4 Pro”)

GPT-5.4 Thinking is available to Plus, Team, and Pro users and replaces GPT-5.2 Thinking in the model picker.
GPT-5.2 Thinking remains in Legacy Models for three months, then retires on June 5, 2026.
GPT-5.4 Pro is available for Pro and Enterprise plans (maximum performance tier).

In the API and Codex: “gpt-5.4” and “gpt-5.4-pro”

GPT-5.4 is available as gpt-5.4
GPT-5.4 Pro is available as gpt-5.4-pro

2) The core theme: “Professional work” (not just smarter chat)

OpenAI is explicitly optimizing GPT-5.4 for real work products: spreadsheets, presentations, documents, and multi-step workflows that need tools.

It also introduced a new internal evaluation category that signals intent: GDPval, where models generate deliverables across many real occupations (sales decks, accounting spreadsheets, schedules, diagrams, etc.). GPT-5.4 scores 83.0% wins-or-ties there (vs 70.9% for GPT-5.2 and GPT-5.3-Codex).

Spreadsheet and presentation emphasis is real

OpenAI highlights:

A newly released ChatGPT for Excel add-in (targeted at Enterprise workflows)
Updated spreadsheet and presentation skills for Codex and API usage

And press coverage frames this as OpenAI pushing hard into “office work” and finance-style analysis (Excel/Sheets + market data workflows).

3) Fewer errors: OpenAI is bragging about factuality, not vibes

OpenAI claims GPT-5.4 is its “most factual model yet” on a set of de-identified prompts where users flagged factual errors:

Individual claims 33% less likely to be false
Full responses 18% less likely to contain any errors
(compared to GPT-5.2)

This is important because “professional work” fails fast if the model confidently invents details. OpenAI is clearly aiming at trust in deliverables (reports, decks, spreadsheets) rather than just chat quality.

4) The headline feature for builders: native “computer use” (agents that drive UIs)

OpenAI calls GPT-5.4 its first general-purpose model with native computer-use capabilities, intended for agents that complete tasks across websites and software systems.

This is the capability the internet is latching onto because it pushes AI from “assistant” → “operator”:

The model can interpret screenshots
It can output mouse/keyboard actions
It can write automation (e.g., via Playwright)
It can coordinate multi-step flows across apps

The practical loop (how it works in real products)

OpenAI’s “computer tool” documentation describes a standard pattern:

You send a task with the computer tool enabled
The model returns a computer_call containing actions (click, type, scroll, screenshot request)
Your “harness” executes those actions in a browser/container
You send back an updated screenshot (computer_call_output)
Repeat until the model stops issuing computer_call

Safety + engineering best practices (OpenAI is explicit here)

The docs strongly recommend:

Run the tool in an isolated browser/container
Keep an allow list of domains/actions
Keep a human in the loop for purchases, authenticated flows, destructive actions

Vision upgrades that make computer use viable

GPT-5.4 also improved visual perception and document parsing:

Better MMMU-Pro scores vs GPT-5.2 (visual reasoning)
Better OmniDocBench error rate vs GPT-5.2 (document extraction fidelity)
New original image input detail level supporting high-fidelity perception up to 10.24M pixels (with updated limits for high too)

“Step toward autonomous agents” is the mainstream framing

Multiple outlets framed GPT-5.4 as a big step toward autonomous workflows precisely because of computer control and improved tool calling.

5) Tool use got a real architecture upgrade: “tool search”

If you build agents with lots of tools/connectors, GPT-5.4 introduces tool search in the API.

Why it matters

Previously, models often needed all tool definitions stuffed into the prompt up front — expensive, slower, and it pollutes context with tools the model may never use.

With tool search:

The model receives a lightweight tool list
When needed, it fetches the definition on demand and appends it at that moment

OpenAI reports that on 250 tasks from a benchmark with 36 MCP servers, tool search reduced total token usage by 47% with the same accuracy.

This is one of those “under the hood” changes that makes enterprise agents cheaper and more scalable.

6) Long context: “up to 1M tokens,” with caveats

OpenAI states GPT-5.4 supports up to 1M tokens of context to plan/execute/verify long-horizon tasks.

But the details matter:

In Codex, 1M context is described as experimental
You can try it with model_context_window and model_auto_compact_token_limit
Requests beyond the standard 272K context window count against usage limits at 2× the normal rate

Translation: huge context exists, but you still need to engineer for it (compaction, budgets, and careful “what goes in the window”).

7) Steerability: ChatGPT can show a plan first, and you can redirect mid-response

In ChatGPT, GPT-5.4 Thinking can provide an up-front plan/preamble for longer tasks so you can adjust direction before it commits to a full output.

This is a UX upgrade, not a “benchmark upgrade,” but it matters a lot in real work:

fewer wasted turns
fewer “I didn’t mean that” restarts
clearer control over the deliverable

8) Coding: GPT-5.4 is meant to simplify the “which model do I use?” problem

OpenAI explicitly says GPT-5.4 is its first mainline reasoning model that incorporates the frontier coding capability of GPT-5.3-Codex, and that the naming is meant to simplify choices across Codex.

On public coding evaluations, GPT-5.4 shows:

SWE-Bench Pro (Public): 57.7%
Terminal-Bench 2.0: 75.1%

The important bit is not “a point here or there,” but that OpenAI is blending “agentic coding” with “knowledge-work deliverables” into one default professional model.

9) Pricing and performance knobs (API)

OpenAI’s pricing table in the launch post lists:

gpt-5.4: $2.50 / 1M input, $0.25 / 1M cached input, $15 / 1M output
gpt-5.4-pro: $30 / 1M input, $180 / 1M output

OpenAI also notes:

GPT-5.4 is priced higher per token than GPT-5.2, but aims to reduce total tokens via efficiency
Batch/Flex are half the standard rate; Priority is 2×

10) Safety: “High cyber capability” and stronger mitigations

OpenAI says it treats GPT-5.4 as High cyber capability under its Preparedness Framework and deploys corresponding protections (monitoring, access controls, asynchronous blocking on higher-risk requests, especially on Zero Data Retention surfaces).

For normal product builders, the implication is:

some security-adjacent workflows may see stricter behavior
you should design “verification loops” and human approvals into high-impact actions anyway (which OpenAI also recommends for computer use)

11) What the internet is saying so far (themes you’ll keep seeing)

Here’s the consensus pattern across coverage today:

“This is OpenAI’s agent push”

Computer control + better tool calling = agents that can actually do work in the real world (apps, websites, workflows).

“This is an office-work / finance push”

Excel/Sheets integrations and “professional work” messaging are being interpreted as OpenAI targeting knowledge-work automation more directly than before.

“This is about fewer errors and fewer wasted turns”

Plan-first steerability and the 33%/18% factuality claim are being highlighted because they map to daily usability, not just raw intelligence.

12) Practical advice: how to actually get value from GPT-5.4 (especially if you build or ship things)

OpenAI’s prompt guidance for GPT-5.4 is blunt: you get the biggest gains when you define:

the output contract (format, structure, constraints)
tool-use expectations
completion criteria (“what done looks like”)

The highest-leverage patterns (copy these into your internal prompt templates)

A completeness contract: what must be included
A verification loop: “check your work” steps
Tool persistence rules: keep going until finished, retry intelligently

If you’re building computer-use agents

Treat “computer use” like you’d treat a junior operator:

sandbox the environment
allowlist domains/actions
require approval for irreversible steps

13) What this means (web dev / ecom / ops)

If you’re a developer, agency, or SaaS builder, GPT-5.4’s biggest practical impact is the combination of:

better deliverables (docs/sheets/decks)
computer-use agents (UI automation)
tool search (big tool ecosystems without massive prompt bloat)

Concrete examples:

Auto-building weekly client reports (Sheets → slides → email draft)
QA agents that click through storefront flows and log regressions
Back-office automation: data entry, portal workflows, reconciliation
Support ops: pull data, draft response, propose next actions, update systems

If your business already “runs on tabs,” GPT-5.4 is OpenAI’s clearest attempt to make the AI operate those tabs, not just explain them.

Sorca Marian

Founder/CEO/CTO of SelfManager.ai & abZ.Global | Senior Software Engineer

https://SelfManager.ai