Be part of our day by day and weekly newsletters for the newest updates and unique content material on industry-leading AI protection. Be taught Extra
It was a giant week for AI bulletins following occasions from Microsoft, Google, and Anthropic. However OpenAI is ending issues out with information of its personal. And no, we’re not simply speaking about its $6.5 billion acquisition of Jony Ive’s design group to steer a new {hardware} effort, “io” at OpenAI.
At this time, the firm upgraded its Operator autonomous net shopping and cursor controlling agent inside ChatGPT from utilizing the prior GPT-4o multimodal massive language mannequin to the newer and extra highly effective o3 reasoning mannequin.
The replace, launched globally in the present day, Might 23, 2025, is offered as a “analysis preview” to paying subscribers of OpenAI’s $200 USD-monthly ChatGPT Professional plan.
Mainly, that’s OpenAI’s manner of claiming it’s not a totally “sanded down” or perfected product but — it might nonetheless have kinks and points.
However with rival Google providing its personal prime tier AI subscription bundle for a value of almost $250 USD recurrently (at present working a reduction right down to $125 for the primary three months) to entry its newest Gemini multimodal, Imagen picture era, and Veo video era fashions, all of the sudden OpenAI’s ChatGPT Professional plan appears extra inexpensive by comparability.
What’s OpenAI’s Operator and what’s it for?
Operator first debuted in January 2025 as OpenAI’s preliminary step into semi-autonomous brokers, particularly Laptop Utilizing Brokers (CUAs). The thought is to transcend the chatbot interface of ChatGPT and permit OpenAI’s highly effective AI fashions to start out taking extra actions on behalf of the person.
Thus, Operator was designed to autonomously level, click on, scroll, and kind to finish web-based duties reminiscent of reserving dinner reservations, compiling procuring lists, or ordering occasion tickets. This agentic functionality permits it to finish person duties instantly by way of a browser interface, from reserving reservations to gathering on-line information.
For security, privateness and safety functions, Operator didn’t use any present net browser on a person’s PC or Mac. As an alternative, it ran in a cloud-hosted digital browser accessible through a standalone website—operator.chatgpt.com—the place customers might enter requests and observe the agent carry out duties in actual time.
It mixed imaginative and prescient, reasoning, and interplay capabilities primarily based on GPT-4o, marking a brand new route for OpenAI in agentic AI.
The product was launched as a analysis preview for ChatGPT Professional subscribers and featured built-in security measures like person confirmations, Watch Mode, and restrictions on high-risk net platforms.
It was additionally being examined in enterprise contexts, together with journey planning and civic companies, demonstrating its potential throughout each client and enterprise environments.
o3 gives improved accuracy, construction, and success charges
With this replace, OpenAI goals to boost efficiency throughout a number of key dimensions. The brand new o3-based Operator demonstrates improved persistence and accuracy throughout browser interactions.
In sensible phrases, this implies it’s extra more likely to full person duties efficiently and with much less want for correction or repetition. Furthermore, customers can anticipate responses which are clearer, extra structured, and extra complete.
In comparative evaluations, the brand new mannequin reveals a definite choice benefit over its predecessor. Human choice research reveal that customers favor the o3 mannequin for its type, comprehensiveness, and readability. It additionally performs strongly in instruction following and effectivity, although outcomes for factual correctness are extra balanced between variations.
Efficiency on third-party analysis benchmarks displays these enhancements. On the OSWorld benchmark that measures completion of browser-based duties, the o3 mannequin scores 42.9 in comparison with 38.1 for the earlier model.
Nevertheless, OpenAI notes that as a consequence of limitations within the automated grading system, the precise efficiency achieve may very well be nearer to twenty share factors!
On WebArena, the brand new mannequin achieved a rating of 62.9, up from 48.1. Probably the most dramatic enchancment seems on the GAIA benchmark, the place the o3 mannequin scores 62.2, vastly surpassing the prior mannequin’s 12.3.
Facet-by-side process comparisons additional illustrate these positive factors. In a single instance involving a restaurant reserving request, the brand new mannequin supplied a clearer and extra detailed checklist of obtainable reservations, together with areas, Michelin scores, and seating notes, introduced in a well-formatted desk. The earlier model, whereas useful, delivered much less data in a much less organized method, based on a picture included with the new o3 Operator launch notes:
Safeguards stay, as do normal cautionary notes about utilization on delicate, monetary transactions and account entry
The o3 mannequin additionally inherits the security measures launched with earlier variations, with additional fine-tuning for its position as an agentic system.
OpenAI has built-in enhanced coaching towards dangerous process execution, immediate injection vulnerabilities, and errors involving person intent.
Evaluations present that the mannequin now confirms 94% of delicate actions earlier than executing them, with 100% affirmation in monetary transactions. Immediate injection susceptibility has additionally decreased from 23% to twenty%.
Notably, the o3 Operator maintains a cautious boundary on sure high-risk net interactions, reminiscent of electronic mail or monetary platforms, the place it might require person supervision through Watch Mode or explicitly refuse to proceed. These measures are a part of a layered strategy to security that mixes model-level robustness with real-time monitoring.
Whereas the improve to Operator marks a technical enchancment, it additionally displays OpenAI’s ongoing dedication to accountable AI deployment.
The system’s means to take real-world actions introduces new dangers, and the event group continues to refine its security protocols accordingly.
Based on OpenAI’s up to date o3 system card documentationthe mannequin stays beneath high-risk functionality thresholds in classes reminiscent of organic and chemical misuse and has no native coding atmosphere or terminal entry, additional lowering potential misuse vectors.
Operator stays a analysis preview and is accessible solely to ChatGPT Professional customers. The Responses API model of Operator will proceed to be primarily based on the GPT-4o mannequin, at the least for now.
Implications for enterprise technical decision-makers
The upgraded Operator stands to considerably improve the workflows of pros in AI engineering, orchestration, information administration, and IT safety.
For these constructing or sustaining machine studying fashions, the mannequin’s improved accuracy and structured outputs scale back the overhead of take a look at validation and troubleshooting.
In orchestration contexts, it gives a sensible, dependable software for automating browser-based parts of advanced pipelines.
Knowledge engineers can delegate guide net interactions—reminiscent of information verification and scraping—with extra confidence, liberating time for higher-level optimization work.
Safety professionals, in the meantime, achieve a safer solution to simulate person habits in audits and incident response workout routines, due to the mannequin’s layered security mechanisms.
Throughout these disciplines, the o3-based Operator introduces each a functionality improve and a danger mitigation framework, making it a sensible addition to the fashionable technical toolkit.
Day by day insights on enterprise use instances with VB Day by day
If you wish to impress your boss, VB Day by day has you lined. We provide the inside scoop on what firms are doing with generative AI, from regulatory shifts to sensible deployments, so you may share insights for optimum ROI.
Thanks for subscribing. Try extra VB newsletters right here.
An error occured.