OpenAI brings GPT-4.1 and 4.1 mini to ChatGPT — what enterprises ought to know

May 15, 2025

26

Be part of our day by day and weekly newsletters for the newest updates and unique content material on industry-leading AI protection. Be taught Extra

OpenAI is rolling out GPT-4.1, its new non-reasoning giant language mannequin (LLM) that balances excessive efficiency with decrease value, to customers of ChatGPT. The corporate is starting with its paying subscribers on ChatGPT Plus, Professional, and Staff, with Enterprise and Schooling consumer entry anticipated within the coming weeks.

It’s additionally including GPT-4.1 mini, which replaces GPT-4o mini because the default for all ChatGPT customers, together with these on the free tier. The “mini” model gives a smaller-scale parameter and thus, much less highly effective model with comparable security requirements.

The fashions are each out there through the “extra fashions” dropdown choice within the prime nook of the chat window inside ChatGPT, giving customers flexibility to decide on between GPT-4.1, GPT-4.1 mini, and reasoning fashions resembling o3, o4-mini, and o4-mini-high.

Initially supposed to be used solely by third-party software program and AI builders by means of OpenAI’s software programming interface (API), GPT-4.1 was added to ChatGPT following robust consumer suggestions.

OpenAI publish coaching analysis lead Michelle Pokrass confirmed on X the shift was pushed by demand, writing: “we have been initially planning on protecting this mannequin api solely however you all needed it in chatgpt 🙂 pleased coding!”

OpenAI Chief Product Officer Kevin Weil posted on X saying: “We constructed it for builders, so it’s excellent at coding and instruction following—give it a attempt!”

An enterprise-focused mannequin

GPT-4.1 was designed from the bottom up for enterprise-grade practicality.

Launched in April 2025 alongside GPT-4.1 mini and nano, this mannequin household prioritized developer wants and manufacturing use circumstances.

GPT-4.1 delivers a 21.4-point enchancment over GPT-4o on the SWE-bench Verified software program engineering benchmark, and a ten.5-point achieve on instruction-following duties in Scale’s MultiChallenge benchmark. It additionally reduces verbosity by 50% in comparison with different fashions, a trait enterprise customers praised throughout early testing.

Context, pace, and mannequin entry

GPT-4.1 helps the usual context home windows for ChatGPT: 8,000 tokens at no cost customers, 32,000 tokens for Plus customers, and 128,000 tokens for Professional customers.

In line with developer Angel Bogado posting on X, these limits match these utilized by earlier ChatGPT fashions, although plans are underway to extend context dimension additional.

Whereas the API variations of GPT-4.1 can course of as much as a million tokens, this expanded capability is just not but out there in ChatGPT, although future help has been hinted at.

This prolonged context functionality permits API customers to feed complete codebases or giant authorized and monetary paperwork into the mannequin—helpful for reviewing multi-document contracts or analyzing giant log recordsdata.

OpenAI has acknowledged some efficiency degradation with extraordinarily giant inputs, however enterprise check circumstances recommend strong efficiency as much as a number of hundred thousand tokens.

Evaluations and security

OpenAI has additionally launched a Security Evaluations Hub web site to present customers entry to key efficiency metrics throughout fashions.

GPT-4.1 exhibits strong outcomes throughout these evaluations. In factual accuracy checks, it scored 0.40 on the SimpleQA benchmark and 0.63 on PersonQA, outperforming a number of predecessors.

It additionally scored 0.99 on OpenAI’s “not unsafe” measure in commonplace refusal checks, and 0.86 on more difficult prompts.

Nevertheless, within the StrongReject jailbreak check—an instructional benchmark for security below adversarial situations—GPT-4.1 scored 0.23, behind fashions like GPT-4o-mini and o3.

That mentioned, it scored a robust 0.96 on human-sourced jailbreak prompts, indicating extra strong real-world security below typical use.

In instruction adherence, GPT-4.1 follows OpenAI’s outlined hierarchy (system over developer, developer over consumer messages) with a rating of 0.71 for resolving system vs. consumer message conflicts. It additionally performs nicely in safeguarding protected phrases and avoiding resolution giveaways in tutoring situations.

Contextualizing GPT-4.1 towards predecessors

The discharge of GPT-4.1 comes after scrutiny round GPT-4.5, which debuted in February 2025 as a analysis preview. That mannequin emphasised higher unsupervised studying, a richer information base, and lowered hallucinations—falling from 61.8% in GPT-4o to 37.1%. It additionally showcased enhancements in emotional nuance and long-form writing, however many customers discovered the enhancements refined.

Regardless of these good points, GPT-4.5 drew criticism for its excessive value — as much as $180 per million output tokens through API —and for underwhelming efficiency in math and coding benchmarks relative to OpenAI’s o-series fashions. Trade figures famous that whereas GPT-4.5 was stronger generally dialog and content material era, it underperformed in developer-specific functions.

Against this, GPT-4.1 is meant as a sooner, extra targeted various. Whereas it lacks GPT-4.5’s breadth of data and in depth emotional modeling, it’s higher tuned for sensible coding help and adheres extra reliably to consumer directions.

On OpenAI’s API, GPT-4.1 is presently priced at $2.00 per million enter tokens, $0.50 per million cached enter tokens, and $8.00 per million output tokens.

For these in search of a steadiness between pace and intelligence at a decrease value, GPT-4.1 mini is out there at $0.40 per million enter tokens, $0.10 per million cached enter tokens, and $1.60 per million output tokens.

Google’s Flash-Lite and Flash fashions can be found beginning at $0.075–$0.10 per million enter tokens and $0.30–$0.40 per million output tokens, lower than a tenth the price of GPT-4.1’s base charges.

However whereas GPT-4.1 is priced increased, it gives stronger software program engineering benchmarks and extra exact instruction following, which can be important for enterprise deployment situations requiring reliability over value. In the end, OpenAI’s GPT-4.1 delivers a premium expertise for precision and growth efficiency, whereas Google’s Gemini fashions attraction to cost-conscious enterprises needing versatile mannequin tiers and multimodal capabilities.

What It means for enterprise choice makers

The introduction of GPT-4.1 brings particular advantages to enterprise groups managing LLM deployment, orchestration, and knowledge operations:

AI Engineers overseeing LLM deployment can count on improved pace and instruction adherence. For groups managing the total LLM lifecycle—from mannequin fine-tuning to troubleshooting—GPT-4.1 gives a extra responsive and environment friendly toolset. It’s notably appropriate for lean groups below strain to ship high-performing fashions shortly with out compromising security or compliance.
AI orchestration leads targeted on scalable pipeline design will admire GPT-4.1’s robustness towards most user-induced failures and its robust efficiency in message hierarchy checks. This makes it simpler to combine into orchestration techniques that prioritize consistency, mannequin validation, and operational reliability.
Information engineers accountable for sustaining excessive knowledge high quality and integrating new instruments will profit from GPT-4.1’s decrease hallucination charge and better factual accuracy. Its extra predictable output conduct aids in constructing reliable knowledge workflows, even when crew sources are constrained.
IT safety professionals tasked with embedding safety throughout DevOps pipelines might discover worth in GPT-4.1’s resistance to widespread jailbreaks and its managed output conduct. Whereas its tutorial jailbreak resistance rating leaves room for enchancment, the mannequin’s excessive efficiency towards human-sourced exploits helps help secure integration into inner instruments.

Throughout these roles, GPT-4.1’s positioning as a mannequin optimized for readability, compliance, and deployment effectivity makes it a compelling possibility for mid-sized enterprises seeking to steadiness efficiency with operational calls for.

A brand new step ahead

Whereas GPT-4.5 represented a scaling milestone in mannequin growth, GPT-4.1 facilities on utility. It’s not the costliest or essentially the most multimodal, nevertheless it delivers significant good points in areas that matter to enterprises: accuracy, deployment effectivity, and value.

This repositioning displays a broader {industry} pattern—away from constructing the most important fashions at any value, and towards making succesful fashions extra accessible and adaptable. GPT-4.1 meets that want, providing a versatile, production-ready device for groups attempting to embed AI deeper into their enterprise operations.

As OpenAI continues to evolve its mannequin choices, GPT-4.1 represents a step ahead in democratizing superior AI for enterprise environments. For decision-makers balancing functionality with ROI, it gives a clearer path to deployment with out sacrificing efficiency or security.

Each day insights on enterprise use circumstances with VB Each day

If you wish to impress your boss, VB Each day has you lined. We provide the inside scoop on what corporations are doing with generative AI, from regulatory shifts to sensible deployments, so you’ll be able to share insights for optimum ROI.

Learn our Privateness Coverage

Thanks for subscribing. Take a look at extra VB newsletters right here.

An error occured.

OpenAI brings GPT-4.1 and 4.1 mini to ChatGPT — what enterprises ought to know

An enterprise-focused mannequin

Context, pace, and mannequin entry

Evaluations and security

Contextualizing GPT-4.1 towards predecessors

What It means for enterprise choice makers

A brand new step ahead

Related Articles

Level-of-care testing for early prognosis and inhabitants screening of Alzheimer’s illness: Latest advances and views

Robots can see. However they nonetheless cannot really feel.

30 Shark Vacuums Are on Sale Proper Now. These Are the two Greatest Offers

LEAVE A REPLY Cancel reply

Latest Articles

Level-of-care testing for early prognosis and inhabitants screening of Alzheimer’s illness: Latest advances and views

Robots can see. However they nonetheless cannot really feel.

30 Shark Vacuums Are on Sale Proper Now. These Are the two Greatest Offers

Industrial additive manufacturing sector data renewed development of 5.6%

For KPMG Canada’s Christine Andrew, Copilot isn’t only a time saver—it unlocks high-value impression

ABOUT US