OpenAI Introduces GPT-5.5, a New Model Built for Real Work

Apr 23

OpenAI just introduced GPT-5.5, and the company is framing it as more than a routine model refresh.

According to OpenAI, GPT-5.5 is its smartest and most intuitive model yet, built to handle messy, multi-step work on a computer with less hand-holding from the user. The company says the model is especially strong in agentic coding, computer use, knowledge work, and early scientific research, while still matching GPT-5.4’s per-token latency in real-world serving. OpenAI also says GPT-5.5 uses significantly fewer tokens than GPT-5.4 on the same Codex tasks.

That is the real story here.

This is not just another model upgrade

The important part is not just that GPT-5.5 scores higher on benchmarks. It is that OpenAI is clearly pushing the model as something closer to a practical work engine. The pitch is about giving the model a broad task, letting it plan, use tools, navigate ambiguity, check its own work, and keep going until the task is done. That is a different product narrative from the older “ask a question, get an answer” framing.

For a while, a lot of AI product releases could still be understood through the chatbot lens. The model got smarter, faster, cheaper, or more multimodal, but the underlying experience still felt like a better assistant inside a message box. GPT-5.5 feels like part of a broader shift away from that. OpenAI is describing a system that can move through real workflows, not just respond well inside a conversation.

That distinction matters because real work is rarely clean.

In actual business environments, the task is usually vague at the start. The information is incomplete. The files are messy. The goals are changing. There are browser tabs, documents, spreadsheets, codebases, notes, emails, and half-finished thoughts involved. If a model can genuinely handle that kind of environment better, that is more meaningful than another incremental improvement on short benchmark questions.

OpenAI is making a serious coding claim

OpenAI is also making a strong claim on coding.

The company says GPT-5.5 is its strongest agentic coding model so far. On Terminal-Bench 2.0, it scored 82.7%, and on SWE-Bench Pro it reached 58.6%. OpenAI also reports that GPT-5.5 outperformed GPT-5.4 on internal Expert-SWE testing, scoring 73.1% versus 68.5%.

In plain terms, OpenAI is saying the model is better not only at generating code, but at handling the kind of multi-step engineering work that includes planning, debugging, testing, validation, and carrying changes across a larger codebase.

That part is important for developers because useful coding assistance is no longer just about whether a model can write a function. The more valuable question is whether it can stay coherent across a full task. Can it hold context inside a complicated system? Can it understand an ambiguous bug? Can it recognize that fixing one piece of the code may require touching several others? Can it test its own assumptions instead of blindly producing output? OpenAI is very clearly positioning GPT-5.5 as better at that entire process.

The company also leans into user stories that reinforce this narrative. The examples are designed to show a model that behaves less like a code autocomplete system and more like a high-level engineering collaborator. That is the category OpenAI seems to want to own here.

The bigger story may be knowledge work

The broader productivity angle may matter even more for business users.

OpenAI says GPT-5.5 performs at a state-of-the-art level on several benchmarks tied to professional work. It scored 84.9% on GDPval, 78.7% on OSWorld-Verified, and 98.0% on Tau2-bench Telecom without prompt tuning. The company also says GPT-5.5 improved on GPT-5.4 in FinanceAgent, internal investment banking modeling tasks, OfficeQA Pro, BrowseComp, Toolathlon, and several long-context evaluations.

That makes this launch feel less like a pure reasoning milestone and more like a push toward end-to-end execution across software, documents, spreadsheets, browsing, and tool use.

This is where the release becomes interesting even for people who do not write code.

A lot of professional work is really about gathering information, understanding what matters, converting that into some useful structure, and producing an output someone else can use. That could be a spreadsheet, a report, a plan, a slide deck, a cleaned-up brief, or an answer backed by research. OpenAI is basically saying GPT-5.5 is better at that full loop, not just the final wording step.

If that claim holds up in practice, it changes the way people think about AI tools inside companies. Instead of asking, “Can this help me write faster?” the more relevant question becomes, “Can this help me finish larger chunks of work with less supervision?”

That is a much bigger commercial opportunity.

OpenAI wants this to look useful inside real companies

OpenAI is also leaning hard into the idea that this is useful across real internal workflows, not just benchmark demos.

The company says more than 85% of OpenAI uses Codex every week. It gave examples including communications work, business reporting, and finance review. One example said the finance team used the workflow to review 24,771 K-1 tax forms totaling 71,637 pages, accelerating the process by two weeks compared with the previous year.

These examples are not just there to impress readers. They serve a strategic purpose.

OpenAI wants to show that GPT-5.5 is not limited to developer-first use cases. It wants finance teams, operations teams, product teams, analysts, marketers, and researchers to see themselves in the product story too. That is a very deliberate shift in positioning. The company is not selling only intelligence anymore. It is selling useful execution across departments.

There is also a strong “computer use” subtext throughout the release.

That may end up being one of the most important parts of the whole announcement. A model that can reason well is valuable. A model that can reason while navigating software, clicking through interfaces, handling documents, and moving between tools is much more disruptive. The more AI can operate in the environment where real work already happens, the less friction there is between intelligence and action.

The research angle is bigger than it looks

There is also a research angle that stands out.

OpenAI says GPT-5.5 shows gains in scientific and technical research workflows, improved over GPT-5.4 on GeneBench, achieved leading performance among published scores on BixBench, and even helped discover a new proof related to off-diagonal Ramsey numbers that was later verified in Lean.

Whether or not every reader cares about those specifics, the message is clear: OpenAI wants GPT-5.5 to be seen as a model that can stay useful over longer, more technical problem-solving loops, not just one-shot queries.

This part of the announcement is especially notable because it points toward a future where frontier models are judged less by how clever they sound in a single response and more by how useful they are over an extended process. Research is messy. It involves testing assumptions, checking evidence, revising ideas, and staying with a problem through multiple passes. OpenAI is arguing that GPT-5.5 is better at persisting through that kind of loop.

Even if most businesses are not doing frontier biology or mathematics, the same principle applies in everyday work. Many high-value tasks are not solved in one step. They require iteration. They require memory across the process. They require staying aligned with the goal while the route changes. GPT-5.5 is being presented as a model built for that reality.

GPT-5.5 is rolling out inside ChatGPT and Codex first

On availability, OpenAI is rolling GPT-5.5 out now inside its own products first.

As of April 23, 2026, GPT-5.5 is rolling out to Plus, Pro, Business, and Enterprise users in ChatGPT and Codex. GPT-5.5 Pro is rolling out to Pro, Business, and Enterprise users in ChatGPT. In Codex, GPT-5.5 is also available for Edu and Go plans with a 400K context window, and Fast mode runs 1.5x faster at 2.5x the cost. OpenAI says API access for gpt-5.5 and gpt-5.5-pro is coming very soon rather than arriving on day one.

That rollout strategy also says something.

OpenAI is giving priority to its own surfaces, where it controls the user experience, the tooling environment, and the safety layer. That makes sense. A model like this is easier to demonstrate when it lives inside ChatGPT and Codex, where the workflow itself can be shaped around the model’s strengths. The API story matters too, but the company appears to be treating product integration as the first showcase.

Premium model, premium pricing

For developers, the API pricing is not cheap, but OpenAI is clearly positioning the model as premium infrastructure for high-value work.

OpenAI says gpt-5.5 will be priced at $5 per 1M input tokens and $30 per 1M output tokens, with a 1M context window. Batch and Flex pricing are set at half the standard rate, while Priority processing is 2.5x the standard rate. OpenAI also says gpt-5.5-pro will cost $30 per 1M input tokens and $180 per 1M output tokens.

The company argues that the higher pricing versus GPT-5.4 is offset by better intelligence and much stronger token efficiency.

This is a familiar pattern at the frontier end of AI.

The base headline price can look high, but the real business calculation is rarely about token price alone. It is about whether the model can complete more of the task, require fewer retries, reduce human correction, and save enough time to justify the cost. OpenAI is clearly betting that customers will accept higher pricing if the model genuinely moves the quality bar enough.

That will probably be the real test of GPT-5.5 in the market.

Not whether it is impressive in a demo, but whether teams feel the difference in output quality, speed, and reliability strongly enough to change usage behavior. Premium models win when they save enough time or create enough leverage that the higher price feels rational.

Safety is a bigger part of the launch now

OpenAI is also emphasizing safety more than usual in this launch.

The company says GPT-5.5 ships with its strongest safeguards so far, after evaluation across its preparedness frameworks, targeted testing for advanced cybersecurity and biology capabilities, internal and external redteaming, and feedback from nearly 200 trusted early-access partners. OpenAI also says it is deploying stricter cyber-risk classifiers and tighter controls around higher-risk cyber activity, even if some users may initially find those controls annoying.

That emphasis is not accidental.

As models become more capable at coding, tool use, and operating software, the upside grows, but so does the risk. A model that is genuinely helpful for defensive security work can also raise harder questions around misuse. OpenAI is trying to show that it understands this tension and is actively building tighter controls as the capabilities improve.

This will likely become a bigger theme in every major model release from here on out. The more AI becomes useful for real operational work, the harder it is to separate performance conversations from safety conversations. Those two topics are now linked.

The real takeaway

From an abz.global perspective, the biggest takeaway is simple.

GPT-5.5 looks like another step away from AI as a chat interface and toward AI as an operational layer for real work. The benchmark gains matter, but the more important shift is product direction: better coding, better computer use, better document and spreadsheet work, better persistence across long tasks, and better efficiency at roughly the same latency as the previous generation.

If GPT-5 was about making advanced reasoning mainstream, GPT-5.5 looks more like OpenAI trying to make advanced execution feel normal.

That is why this release matters.

The most important AI products over the next few years may not be the ones that simply answer questions the best. They may be the ones that can actually carry work forward with the least friction. OpenAI is signaling that this is the path it wants to lead: models that do not just inform the user, but materially help finish the job.

GPT-5.5 looks like a continuation of that shift.

It is not just a smarter model. It is a more usable one, a more work-oriented one, and in OpenAI’s framing, a more operational one. For developers, researchers, analysts, and teams trying to understand where AI products are headed next, that may be the most important signal in the entire announcement.

https://openai.com/index/introducing-gpt-5-5/

Sorca Marian

Founder/CEO/CTO of SelfManager.ai & abZ.Global | Senior Software Engineer

https://SelfManager.ai