"Nvidia set to unveil inference-only chip at GTC," FT reports, challenging Google TPU and others
- Input
- 2026-03-15 05:49:09
- Updated
- 2026-03-15 05:49:09

Nvidia is expected to unveil an inference-only chip at this week’s annual GPU Technology Conference (GTC), the Financial Times (FT) reported on the 14th (local time), citing people familiar with the matter.
GTC will take place in San Jose, California, starting on the 16th and will run for four days until the 19th.
Nvidia’s general-purpose graphics processing units (GPUs), such as those based on the Blackwell architecture, can handle a wide range of tasks, but their extremely high price has become a major obstacle.
Alphabet Inc.’s Google has moved into this gap with its Tensor Processing Unit (TPU) and similar chips. These processors are optimized for specific tasks, namely inference, and are relatively inexpensive.
According to the sources, CEO Jensen Huang will introduce an inference chip at this GTC that will be cheaper than Nvidia’s existing GPUs designed for AI training.
This chip is expected to be the first product to leverage technology from Grok’s Language Processing Unit (LPU). Nvidia acquired the Israeli startup Grok for 20 billion dollars last December. The LPU is a chip focused on handling complex Artificial Intelligence (AI) workloads at very high speed.
Amid concerns that Nvidia could lose its dominance in the inference market to in-house chips such as Google’s TPU, the company appears to have shifted away from its previous strategy centered on general-purpose chips.
Nvidia has dominated the AI training market over the past three years with its general-purpose GPUs, which serve as the backbone of generative AI. Its market capitalization has surged to 4.5 trillion dollars, making it the world’s most valuable listed company.
Huang had long maintained that Nvidia’s general-purpose chips were suitable for both training and inference and intended to stick with that strategy. However, the emergence of AI agents has changed the landscape, prompting him to rethink his approach.
Nvidia’s "all-purpose" AI chips also require High Bandwidth Memory (HBM) from suppliers such as SK hynix and Micron Technology, which are facing severe shortages and soaring prices. This has made it difficult to secure a stable supply.
According to the sources, Nvidia’s new inference chip incorporating Grok’s technology will use static random-access memory (SRAM) instead of the DRAM used in HBM.
SRAM is mainly used as cache memory in the CPU, sitting next to the CPU or GPU and exchanging the most frequently accessed data at extremely high speed. Until now, this SRAM has been used only as auxiliary cache, but Grok and others have set out to build AI chips that rely solely on SRAM, without any Dynamic Random Access Memory (DRAM).
Compared with DRAM, SRAM offers much faster data processing and virtually no latency, making it highly advantageous for enabling AI systems to respond instantaneously.
If Nvidia brings an inference chip to market that uses SRAM to deliver both lower cost and higher efficiency, it could extend its dominant position from the training market into the inference market as well.
According to Bank of America (BofA), the AI data center market will grow to 1.2 trillion dollars by 2030, with inference accounting for 75 percent of that total. Last year, inference represented about 50 percent.
In a research note last week, BofA predicted that Nvidia would use GTC to unveil an SRAM-based inference chip built on Grok’s technology and expand its AI portfolio.
Nvidia is also expected at this GTC to finalize the mass-production schedule for its next-generation "Vera Rubin" AI GPU, which is slated for the second half of this year. The company is likewise anticipated to provide detailed information on its "Feynman" architecture, targeted for launch in 2028.
dympna@fnnews.com Reporter Song Kyung-jae Reporter