Can Nvidia’s dominance survive the sea change under way in AI computing?

Each spring, thousands of software engineers gather in San Jose, Calif., to ogle the latest superfast computer processors and take coding workshops at Nvidia’s annual developers’ confab. The event is known as GTC, which stands for GPU Technology Conference. It might soon need a new name.

This year, for the first time, the focus of the event that starts Monday won’t be squarely on GPUs—or graphics processing units—the uniquely fast and powerful chips upon which Nvidia built its massive computing empire and became the world’s largest publicly traded company.

Instead, much more of the talk will be about inference, the type of computing required to run models and allow them to respond to user queries. That is because the artificial-intelligence industry has definitively moved into a new phase. Nvidia’s customers are less concerned today with training large AI models—what GPUs are best at—and more preoccupied with running them and generating big profits from end-users.

Inference requires a different suite of hardware compared with chips optimized for training, including more power-efficiency, faster interconnections and more high-bandwidth memory.

Nvidia Chief Executive Jensen Huang has been touting 2026 as the year that inference eats AI for a while now. At a March 4 investor conference, he acknowledged that “the inflection that we’re seeing here also sat in plain sight for quite a long time, and it’s basically the ability for AI to use files, access files and use tools.”

These functions, broadly known as agentic AI, rely almost entirely on inference computing, and they are central to the belief that AI will utterly transform the world’s economy. Rapid acceleration in the capability of agents is driving a boom in demand for computing power. Companies like OpenAI and Anthropic, which run the popular coding agents Codex and Claude Code respectively, produce thousands of times as many inference “tokens,” or the basic unit of measurement of data output used in generative AI, as they did before, Huang said.



The Age of Inference is the one tech companies of all sizes have been waiting for, where the economics of AI computing potentially flip from red to green—as long as the cost of providing that computing can be kept low enough. AI companies are moving from their growth phase, which involved investing enormous sums in the infrastructure required for model training—including buying millions of Nvidia’s latest GPUs, from its Hopper and Blackwell generations especially—and attracting hundreds of millions of regular users, to trying to monetize their products through subscription fees or metering the consumption of intelligence.

“It’s really important to realize that inference equals revenues now for our customers, because agents are generating so many tokens and the results are so effective,” Huang said during Nvidia’s most recent earnings call. “We need to inference at a much higher speed, and when you’re inferencing at a much higher speed, and each one of those tokens are dollarized, it directly translates into revenues.”

The challenge for Nvidia now is that its bestselling products are less attractive for inference computing than for training. Its Grace Blackwell servers, users say, consume huge amounts of energy and don’t come with enough memory to allow AI models to quickly and efficiently spit out answers to user queries.

“Nvidia is in a weird moment,” said Paul Kedrosky, a venture investor and research fellow at the Massachusetts Institute of Technology’s Initiative on the Digital Economy. “For a long time, Jensen was saying, ‘We don’t need to have dedicated, stand-alone inference chips, you can just throw a Blackwell at it.’ But that ship has sailed, and there’s a host of new competitors.”

Kedrosky argued that Nvidia’s gross margins, which stood at 73% in the most recent quarter, will by necessity have to compress, for two reasons. First, the business model around inference computing puts a premium on efficiency and reducing the cost of producing the final product, which for consumers means AI tools. The hardware behind it can’t be too pricey or the companies selling it, directly or as middlemen, won’t make money.

Second, there is more competition to supply customers with inference computing because more chip companies have figured out ways to provide it with chips that are cheaper to buy and operate. Nvidia became the first $4 trillion company by selling the silicon equivalent of fast, powerful and expensive Ferrari sports cars, but now the world wants Priuses and Model Ys.

“All this inference stuff is incredibly threatening to Jensen, because it’s all efficiency-driven,” Kedrosky said. “He’s desperately trying to find a way to extend the franchise into inference.”

In December, Nvidia paid $20 billion to license the chip technology and hire the top talent away from Groq, a chip startup that designs a new type of chip called a language processing unit, especially suited to running models. This week at GTC, Nvidia plans to roll out its first computing platform using Groq’s chips, a server that combines a modified version of its new Rubin GPU with a Groq processor that is specifically tailored to inference computing, The Wall Street Journal has reported.

There are other signs, as well, that Nvidia is shifting its focus away from just GPUs and more toward becoming a provider of inference computing. In February, Meta Platforms said it would install thousands of Nvidia’s Vera CPUs—or central processing units, the main processing brains behind most computers—in its AI data centers, the first significant deployment of Nvidia’s systems for AI that didn’t include GPUs. There is a growing recognition that inference computing can be handled using CPUs and doesn’t require Nvidia’s flagship chips.

Nvidia is also planning to unveil new computing solutions that involve multiple CPUs that are unattached to GPUs, the way Meta plans to, the Journal has reported. And Intel, the struggling chip maker that has largely missed the boat on AI computing in recent years, is teasing a major partnership announcement with Nvidia as part of the event. Intel has long been one of the biggest producers of CPUs but doesn’t have a major foothold in GPUs.

“The best quality models are increasingly not viable with the existing infrastructure,” said Shahriar Rabii, a former Google and Meta executive and co-founder of Majestic Labs, a chip startup that is focused on energy efficiency and solving memory shortages for inference.

Nvidia’s massive licensing deal with Groq was fast-tracked after one of its biggest customers, ChatGPT-maker OpenAI, struck a $10 billion pact with the chip startup Cerebras, which designs expensive chips that it says are the fastest inference processors on the market. Last week, Cerebras announced it had signed up Amazon Web Services, the largest cloud provider, as its newest customer, further encroaching on Nvidia’s business.

Andrew Feldman, CEO of Cerebras, has been taking aim at Nvidia and Huang in blog posts for months now, writing on LinkedIn that Nvidia is bound to fall behind rivals in the race to supply the world with inference computing, in part because Nvidia’s proprietary library of programming languages, known as CUDA, is generally only needed for training models, not using them.

“There is no CUDA moat in inference,” Feldman said in an interview. “Obviously, they didn’t want to lose the fast inference business at OpenAI, and we took that from them.”

Tom Burke, chief revenue officer of Nscale, a U.K.-based cloud-services provider that currently uses only Nvidia chips, said that the rise of inference is totally reshaping the landscape for selling computing power. He expects that more AI companies will seek to diversify their chip suppliers in the near future.

“If you were to look at this market 12 months ago, it was probably 90-10 training versus inference in terms of the compute people needed. I think by the end of this year, it will have swung,” Burke said. “We’ve got an obligation to rethink the map for our customers, to be as agile as possible.”

How far ahead Nvidia remains in the AI-infrastructure race depends largely on how effectively it is able to pivot its product road map from training to inference. If the new chips it is building with Groq prove fast, efficient and affordable enough to dominate the competition, then the company will likely remain top dog. And the company is banking on it.

Colette Kress, the company’s chief financial officer, said in a recent interview that agentic AI workloads are starting to become a major driver of revenue growth for Nvidia and that she foresees its chips dominating for the foreseeable future.

“Right now, we’re the king of inference,” Kress said.

Write to Robbie Whelan at robbie.whelan@wsj.com

Source

Leave a Reply

Your email address will not be published. Required fields are marked *

five × five =