Need smarter insights in your inbox? Join our weekly newsletters to get solely what issues to enterprise AI, knowledge, and safety leaders. Subscribe Now
As demand for large-scale AI deployment skyrockets, the lesser-known, non-public chip startup Positron is positioning itself as a direct challenger to market chief Nvidia by providing devoted, energy-efficient, memory-optimized inference chips aimed toward relieving the business’s mounting price, energy, and availability bottlenecks.
“A key differentiator is our capability to run frontier AI fashions with higher effectivity—reaching 2x to 5x efficiency per watt and greenback in comparison with Nvidia,” mentioned Thomas Sohmers, Positron co-founder and CTO, in a latest video name interview with VentureBeat.
Clearly, that’s excellent news for large AI mannequin suppliers, however Positron’s management contends it’s useful for a lot of extra enterprises past, together with these utilizing AI fashions of their workflows, not as service choices to prospects.
“We construct chips that may be deployed in lots of of present knowledge facilities as a result of they don’t require liquid cooling or excessive energy densities,” identified Mitesh Agrawal, Positron’s CEO and the previous chief working officer of AI cloud inference supplier Lambda, additionally in the identical video name interview with VentureBeat.
The AI Affect Sequence Returns to San Francisco – August 5
The subsequent part of AI is right here – are you prepared? Be a part of leaders from Block, GSK, and SAP for an unique take a look at how autonomous brokers are reshaping enterprise workflows – from real-time decision-making to end-to-end automation.
Safe your spot now – house is proscribed: https://bit.ly/3GuuPLF
Enterprise capitalists and early customers appear to agree.
Positron yesterday introduced an oversubscribed $51.6 million Sequence A funding spherical led by Valor Fairness Companions, Atreides Administration and DFJ Progress, with assist from Flume Ventures, Resilience Reserve, 1517 Fund and Until.
As for Positron’s early buyer base, that features each name-brand enterprises and firms working in inference-heavy sectors. Confirmed deployments embrace the most important safety and cloud content material networking supplier Cloudflare, which makes use of Positron’s Atlas {hardware} in its globally distributed, power-constrained knowledge facilities, and Parasail, by way of its AI-native knowledge infrastructure platform SnapServe.
Past these, Positron studies adoption throughout a number of key verticals the place environment friendly inference is vital, corresponding to networking, gaming, content material moderation, content material supply networks (CDNs), and Token-as-a-Service suppliers.
These early customers are reportedly drawn in by Atlas’s capability to ship excessive throughput and decrease energy consumption with out requiring specialised cooling or reworked infrastructure, making it a horny drop-in possibility for AI workloads throughout enterprise environments.
Getting into a difficult market that’s lowering AI mannequin dimension and rising effectivity
However Positron can be getting into a difficult market. The Data simply reported that rival buzzy AI inference chip startup Groq — the place Sohmers beforehand labored as Director of Expertise Technique — has lowered its 2025 income projection from $2 billion+ to $500 million, highlighting simply how risky the AI {hardware} house may be.
Even well-funded corporations face headwinds as they compete for knowledge heart capability and enterprise mindshare in opposition to entrenched GPU suppliers like Nvidia, to not point out the elephant within the room: the rise of extra environment friendly, smaller giant language fashions (LLMs) and specialised small language fashions (SLMs) that may even run on units as small and low-powered as smartphones.
But Positron’s management is for now embracing the development and shrugging off the potential impacts on its progress trajectory.
“There’s all the time been this duality—light-weight purposes on native units and heavyweight processing in centralized infrastructure,” mentioned Agrawal. “We consider each will continue to grow.”
Sohmers agreed, stating: “We see a future the place each individual might need a succesful mannequin on their telephone, however these will nonetheless depend on giant fashions in knowledge facilities to generate deeper insights.”
Atlas is an inference-first AI chip
Whereas Nvidia GPUs helped catalyze the deep studying growth by accelerating mannequin coaching, Positron argues that inference — the stage the place fashions generate output in manufacturing — is now the true bottleneck.
Its founders name it probably the most under-optimized a part of the “AI stack,” particularly for generative AI workloads that rely upon quick, environment friendly mannequin serving.
Positron’s answer is Atlas, its first-generation inference accelerator constructed particularly to deal with giant transformer fashions.
Not like general-purpose GPUs, Atlas is optimized for the distinctive reminiscence and throughput wants of contemporary inference duties.
The corporate claims Atlas delivers 3.5x higher efficiency per greenback and as much as 66% decrease energy utilization than Nvidia’s H100, whereas additionally reaching 93% reminiscence bandwidth utilization—far above the standard 10–30% vary seen in GPUs.
From Atlas to Titan, supporting multi-trillion parameter fashions
Launched simply 15 months after founding — and with solely $12.5 million in seed capital — Atlas is already delivery and in manufacturing.
The system helps as much as 0.5 trillion-parameter fashions in a single 2kW server and is appropriate with Hugging Face transformer fashions by way of an OpenAI API-compatible endpoint.
Positron is now getting ready to launch its next-generation platform, Titan, in 2026.
Constructed on custom-designed “Asimov” silicon, Titan will function as much as two terabytes of high-speed reminiscence per accelerator and assist fashions as much as 16 trillion parameters.
At present’s frontier fashions are within the hundred billions and single digit trillions of parameters, however newer fashions like OpenAI’s GPT-5 are presumed to be within the multi-trillions, and bigger fashions are at present considered required to achieve synthetic normal intelligence (AGI), AI that outperforms people on most economically worthwhile work, and superintelligence, AI that exceeds the flexibility for people to grasp and management.
Crucially, Titan is designed to function with commonplace air cooling in standard knowledge heart environments, avoiding the high-density, liquid-cooled configurations that next-gen GPUs more and more require.
Engineering for effectivity and compatibility
From the beginning, Positron designed its system to be a drop-in substitute, permitting prospects to make use of present mannequin binaries with out code rewrites.
“If a buyer needed to change their conduct or their actions in any approach, form or kind, that was a barrier,” mentioned Sohmers.
Sohmers defined that as an alternative of constructing a fancy compiler stack or rearchitecting software program ecosystems, Positron centered narrowly on inference, designing {hardware} that ingests Nvidia-trained fashions instantly.
“CUDA mode isn’t one thing to combat,” mentioned Agrawal. “It’s an ecosystem to take part in.”
This pragmatic strategy helped the corporate ship its first product rapidly, validate efficiency with actual enterprise customers, and safe vital follow-on funding. As well as, its deal with air cooling versus liquid cooling makes its Atlas chips the one possibility for some deployments.
“We’re centered fully on purely air-cooled deployments… all these Nvidia Hopper- and Blackwell-based options going ahead are required liquid cooling… The one place you may put these racks are in knowledge facilities which can be being newly constructed now in the midst of nowhere,” mentioned Sohmers.
All informed, Positron’s capability to execute rapidly and capital-efficiently has helped distinguish it in a crowded AI {hardware} market.
Reminiscence is what you want
Sohmers and Agrawal level to a elementary shift in AI workloads: from compute-bound convolutional neural networks to memory-bound transformer architectures.
Whereas older fashions demanded excessive FLOPs (floating-point operations), trendy transformers require large reminiscence capability and bandwidth to run effectively.
Whereas Nvidia and others proceed to deal with compute scaling, Positron is betting on memory-first design.
Sohmers famous that with transformer inference, the ratio of compute to reminiscence operations flips to close 1:1, which means that boosting reminiscence utilization has a direct and dramatic affect on efficiency and energy effectivity.
With Atlas already outperforming up to date GPUs on key effectivity metrics, Titan goals to take this additional by providing the very best reminiscence capability per chip within the business.
At launch, Titan is anticipated to supply an order-of-magnitude improve over typical GPU reminiscence configurations — with out demanding specialised cooling or boutique networking setups.
U.S.-built chips
Positron’s manufacturing pipeline is proudly home. The corporate’s first-generation chips have been fabricated within the U.S. utilizing Intel services, with last server meeting and integration additionally primarily based domestically.
For the Asimov chip, fabrication will shift to TSMC, although the group is aiming to maintain as a lot of the remainder of the manufacturing chain within the U.S. as potential, relying on foundry capability.
Geopolitical resilience and provide chain stability have gotten key buying standards for a lot of prospects — another excuse Positron believes its U.S.-made {hardware} affords a compelling different.
What’s subsequent?
Agrawal famous that Positron’s silicon targets not simply broad compatibility however most utility for enterprise, cloud, and analysis labs alike.
Whereas the corporate has not named any frontier mannequin suppliers as prospects but, he confirmed that outreach and conversations are underway.
Agrawal emphasised that promoting bodily infrastructure primarily based on economics and efficiency—not bundling it with proprietary APIs or enterprise fashions—is a part of what offers Positron credibility in a skeptical market.
“In case you can’t persuade a buyer to deploy your {hardware} primarily based on its economics, you’re not going to be worthwhile,” he mentioned.
Every day insights on enterprise use circumstances with VB Every day
If you wish to impress your boss, VB Every day has you coated. We provide the inside scoop on what firms are doing with generative AI, from regulatory shifts to sensible deployments, so you may share insights for max ROI.
Thanks for subscribing. Try extra VB newsletters right here.
An error occured.