Inside OpenAI’s quest to make AI do something for you

Shortly after Hunter Lightman joined OpenAI as a researcher in 2022, he watched his colleagues launch ChatGPT, one of many fastest-growing merchandise ever. In the meantime, Lightman quietly labored on a staff instructing OpenAI’s fashions to unravel highschool math competitions.

Right this moment that staff, generally known as MathGen, is taken into account instrumental to OpenAI’s industry-leading effort to create AI reasoning fashions: the core know-how behind AI brokers that may do duties on a pc like a human would.

“We had been making an attempt to make the fashions higher at mathematical reasoning, which on the time they weren’t superb at,” Lightman instructed TechCrunch, describing MathGen’s early work.

OpenAI’s fashions are removed from good right this moment — the corporate’s newest AI methods nonetheless hallucinate and its brokers wrestle with complicated duties.

However its state-of-the-art fashions have improved considerably on mathematical reasoning. Considered one of OpenAI’s fashions lately gained a gold medal on the Worldwide Math Olympiad, a math competitors for the world’s brightest highschool college students. OpenAI believes these reasoning capabilities will translate to different topics, and finally energy general-purpose brokers that the corporate has all the time dreamed of constructing.

ChatGPT was a contented accident — a lowkey analysis preview turned viral client enterprise — however OpenAI’s brokers are the product of a years-long, deliberate effort throughout the firm.

“Finally, you’ll simply ask the pc for what you want and it’ll do all of those duties for you,” stated OpenAI CEO Sam Altman on the firm’s first developer convention in 2023. “These capabilities are sometimes talked about within the AI discipline as brokers. The upsides of this are going to be super.”

Techcrunch occasion

San Francisco
|
October 27-29, 2025

OpenAI CEO Sam Altman speaks through the OpenAI DevDay occasion on November 06, 2023 in San Francisco, California.(Picture by Justin Sullivan/Getty Photos)Picture Credit:Justin Sullivan / Getty Photos

Whether or not brokers will meet Altman’s imaginative and prescient stays to be seen, however OpenAI shocked the world with the discharge of its first AI reasoning mannequin, o1, within the fall of 2024. Lower than a 12 months later, the 21 foundational researchers behind that breakthrough are probably the most extremely sought-after expertise in Silicon Valley.

Mark Zuckerberg recruited 5 of the o1 researchers to work on Meta’s new superintelligence-focused unit, providing some compensation packages north of $100 million. Considered one of them, Shengjia Zhao, was lately named chief scientist of Meta Superintelligence Labs.

The reinforcement studying renaissance

The rise of OpenAI’s reasoning fashions and brokers are tied to a machine studying coaching approach generally known as reinforcement studying (RL). RL supplies suggestions to an AI mannequin on whether or not its selections had been appropriate or not in simulated environments.

RL has been used for many years. For example, in 2016, a couple of 12 months after OpenAI was based in 2015, an AI system created by Google DeepMind utilizing RL, AlphaGo, gained international consideration after beating a world champion within the board sport, Go.

South Korean skilled Go participant Lee Se-Dol (R) prepares for his fourth match in opposition to Google’s synthetic intelligence program, AlphaGo, through the Google DeepMind Problem Match on March 13, 2016 in Seoul, South Korea. Lee Se-dol performed a five-game match in opposition to a pc program developed by a Google, AlphaGo. (Picture by Google by way of Getty Photos)

Round that point, considered one of OpenAI’s first staff, Andrej Karpathy, started pondering tips on how to leverage RL to create an AI agent that might use a pc. However it could take years for OpenAI to develop the mandatory fashions and coaching strategies.

By 2018, OpenAI pioneered its first massive language mannequin within the GPT sequence, pretrained on large quantities of web knowledge and a big clusters of GPUs. GPT fashions excelled at textual content processing, ultimately resulting in ChatGPT, however struggled with primary math.

It took till 2023 for OpenAI to realize a breakthrough, initially dubbed “Q*” after which “Strawberry,” by combining LLMs, RL, and a way referred to as test-time computation. The latter gave the fashions further time and computing energy to plan and work by way of issues, verifying its steps, earlier than offering a solution.

This allowed OpenAI to introduce a brand new method referred to as “chain-of-thought” (CoT), which improved AI’s efficiency on math questions the fashions hadn’t seen earlier than.

“I might see the mannequin beginning to motive,” stated El Kishky. “It will discover errors and backtrack, it could get annoyed. It actually felt like studying the ideas of an individual.”

Although individually these strategies weren’t novel, OpenAI uniquely mixed them to create Strawberry, which straight led to the event of o1. OpenAI rapidly recognized that the planning and truth checking skills of AI reasoning fashions could possibly be helpful to energy AI brokers.

“We had solved an issue that I had been banging my head in opposition to for a few years,” stated Lightman. “It was one of the thrilling moments of my analysis profession.”

Scaling reasoning

With AI reasoning fashions, OpenAI decided it had two new axes that will permit it to enhance AI fashions: utilizing extra computational energy through the post-training of AI fashions, and giving AI fashions extra time and processing energy whereas answering a query.

“OpenAI, as an organization, thinks lots about not simply the way in which issues are, however the way in which issues are going to scale,” stated Lightman.

Shortly after the 2023 Strawberry breakthrough, OpenAI spun up an “Brokers” staff led by OpenAI researcher Daniel Selsam to make additional progress on this new paradigm, two sources instructed TechCrunch. Though the staff was referred to as “Brokers,” OpenAI didn’t initially differentiate between reasoning fashions and brokers as we consider them right this moment. The corporate simply needed to make AI methods able to finishing complicated duties.

Finally, the work of Selsam’s Brokers staff turned half of a bigger challenge to develop the o1 reasoning mannequin, with leaders together with OpenAI co-founder Ilya Sutskever, chief analysis officer Mark Chen, and chief scientist Jakub Pachocki.

Ilya Sutskever, Russian Israeli-Canadian pc scientist and co-founder and Chief Scientist of OpenAI, speaks at Tel Aviv College in Tel Aviv on June 5, 2023. (Picture by JACK GUEZ / AFP)Picture Credit:Getty Photos

OpenAI must divert valuable assets — primarily expertise and GPUs — to create o1. All through OpenAI’s historical past, researchers have needed to negotiate with firm leaders to acquire assets; demonstrating breakthroughs was a surefire technique to safe them.

“One of many core elements of OpenAI is that every part in analysis is backside up,” stated Lightman. “Once we confirmed the proof (for o1), the corporate was like, ‘This is smart, let’s push on it.’”

Some former staff say that the startup’s mission to develop AGI was the important thing consider attaining breakthroughs round AI reasoning fashions. By specializing in creating the smartest-possible AI fashions, somewhat than merchandise, OpenAI was capable of prioritize o1 above different efforts. That kind of enormous funding in concepts wasn’t all the time attainable at competing AI labs.

The choice to attempt new coaching strategies proved prescient. By late 2024, a number of main AI labs began seeing diminishing returns on fashions created by way of conventional pretraining scaling. Right this moment, a lot of the AI discipline’s momentum comes from advances in reasoning fashions.

What does it imply for an AI to “motive?”

In some ways, the aim of AI analysis is to recreate human intelligence with computer systems. Because the launch of o1, ChatGPT’s UX has been full of extra human-sounding options resembling “considering” and “reasoning.”

When requested whether or not OpenAI’s fashions had been actually reasoning, El Kishky hedged, saying he thinks concerning the idea when it comes to pc science.

“We’re instructing the mannequin tips on how to effectively expend compute to get a solution. So in case you outline it that approach, sure, it’s reasoning,” stated El Kishky.

Lightman takes the method of specializing in the mannequin’s outcomes and never as a lot on the means or their relation to human brains.

The OpenAI logo on screen at their developer day stage. The OpenAI brand on display screen at their developer day stage. (Credit score: Devin Coldeway)Picture Credit:Devin Coldewey

“If the mannequin is doing onerous issues, then it’s doing no matter mandatory approximation of reasoning it wants so as to try this,” stated Lightman. “We are able to name it reasoning, as a result of it appears to be like like these reasoning traces, nevertheless it’s all only a proxy for making an attempt to make AI instruments which might be actually highly effective and helpful to lots of people.”

OpenAI’s researchers observe folks might disagree with their nomenclature or definitions of reasoning — and certainly, critics have emerged — however they argue it’s much less essential than the capabilities of their fashions. Different AI researchers are likely to agree.

Nathan Lambert, an AI researcher with the non-profit AI2, compares AI reasoning modes to airplanes in a weblog publish. Each, he says, are artifical methods impressed by nature — human reasoning and fowl flight, respectively — however they function by way of totally totally different mechanisms. That doesn’t make them any much less helpful, or any much less able to attaining related outcomes.

A gaggle of AI researchers from OpenAI, Anthropic, and Google DeepMind agreed in a latest place paper that AI reasoning fashions usually are not effectively understood right this moment, and extra analysis is required. It could be too early to confidently declare what precisely is occurring inside them.

The subsequent frontier: AI brokers for subjective duties

The AI brokers available on the market right this moment work finest for well-defined, verifiable domains resembling coding. OpenAI’s Codex agent goals to assist software program engineers offload easy coding duties. In the meantime, Anthropic’s fashions have turn out to be significantly widespread in AI coding instruments like Cursor and Claude Code — these are a number of the first AI brokers that individuals are keen to pay up for.

Nonetheless, basic objective AI brokers like OpenAI’s ChatGPT Agent and Perplexity’s Comet wrestle with most of the complicated, subjective duties folks wish to automate. When making an attempt to make use of these instruments for on-line buying or discovering a long-term parking spot, I’ve discovered the brokers take longer than I’d like and make foolish errors.

Brokers are, in fact, early methods that can undoubtedly enhance. However researchers should first determine tips on how to higher prepare the underlying fashions to finish duties which might be extra subjective.

AI purposes (Picture by Jonathan Raa/NurPhoto by way of Getty Photos)

“Like many issues in machine studying, it’s a knowledge drawback,” stated Lightman, when requested concerning the limitations of brokers on subjective duties. “A number of the analysis I’m actually enthusiastic about proper now is determining tips on how to prepare on much less verifiable duties. Now we have some leads on tips on how to do this stuff.”

Noam Brown, an OpenAI researcher who helped create the IMO mannequin and o1, instructed TechCrunch that OpenAI has new general-purpose RL strategies which permit them to show AI fashions abilities that aren’t simply verified. This was how the corporate constructed the mannequin which achieved a gold medal at IMO, he stated.

OpenAI’s IMO mannequin was a more moderen AI system that spawns a number of brokers, which then concurrently discover a number of concepts, after which select the absolute best reply. Some of these AI fashions are gaining popularity; Google and xAI have lately launched state-of-the-art fashions utilizing this system.

“I believe these fashions will turn out to be extra succesful at math, and I believe they’ll get extra succesful in different reasoning areas as effectively,” stated Brown. “The progress has been extremely quick. I don’t see any motive to assume it can decelerate.”

These strategies might assist OpenAI’s fashions turn out to be extra performant, positive aspects that might present up within the firm’s upcoming GPT-5 mannequin. OpenAI hopes to say its dominance over opponents with the launch of GPT-5, ideally providing the finest AI mannequin to energy brokers for builders and shoppers.

However the firm additionally needs to make its merchandise less complicated to make use of. El Kishky says OpenAI needs to develop AI brokers that intuitively perceive what customers need, with out requiring them to pick particular settings. He says OpenAI goals to construct AI methods that perceive when to name up sure instruments, and the way lengthy to motive for.

These concepts paint an image of an final model of ChatGPT: an agent that may do something on the web for you, and perceive the way you need it to be completed. That’s a a lot totally different product than what ChatGPT is right this moment, however the firm’s analysis is squarely headed on this path.

Whereas OpenAI undoubtedly led the AI {industry} a couple of years in the past, the corporate now faces a tranche of worthy opponents. The query is now not simply whether or not OpenAI can ship its agentic future, however can the corporate accomplish that earlier than Google, Anthropic, xAI, or Meta beat them to it?

Supply hyperlink

Inside OpenAI’s quest to make AI do something for you

The reinforcement studying renaissance

Scaling reasoning

What does it imply for an AI to “motive?”

The subsequent frontier: AI brokers for subjective duties

Apple now seeking to rival ChatGPT with a brand new in-house ‘Solutions’ group: report

That rogue PC in your home wants this $15 improve now

Apple begins constructing its ChatGPT rival

LEAVE A REPLY Cancel reply

Most Popular

7+ Enjoyable Issues to Do in Buena Park CA » Native Adventurer » Journey Adventures in Las Vegas + World Huge

D-backs eye sequence win after large night time vs. A’s

The Nintendo Swap 2’s Greatest Downside Is Already Storage

Apple now seeking to rival ChatGPT with a brand new in-house ‘Solutions’ group: report

Recent Comments

EDITOR PICKS

Lollapalooza 2025: Hulu Livestream Schedule & Particulars

Africa: ‘Delivering Higher’ – New ECOSOC President Emphasises Local weather Motion, Meals Safety

‘Go away the oil within the floor’: Identical debates, completely different nation

POPULAR POSTS

Howard Criticized For Sending Scholar Debt To Collections

Why Ray Dalio Is ‘Thrilled About’ Promoting His Final Shares

Steve Jobs As soon as Stated Making One thing Nice Does not Require ‘Extra Vitality’ Or ‘Extra Cash,’ However Does Require This One Talent...

POPULAR CATEGORY

ABOUT US

FOLLOW US