Podcast Episode
The chip incorporates technology from Groq, the inference-focused startup Nvidia acquired for twenty billion dollars in late 2025. Groq's Language Processing Units are purpose-built for the rapid token generation that powers AI coding assistants and autonomous agents, an area where Nvidia's training-optimised GPUs have been less competitive.
Nvidia claims Rubin delivers up to a ten times reduction in inference token cost and a four times reduction in the number of GPUs needed to train mixture-of-experts models compared to Blackwell. Multiple rack configurations are expected, including NVL72, NVL144, and NVL576.
The conference arrives amid continued uncertainty over US chip export controls toward China, with shipments capped at fifty percent of domestic sales. Nvidia has said it still receives no data centre revenue from China.
Rubin-based products are expected to ship in the second half of 2026.
Nvidia Bets Big on Inference with Groq-Powered Chip and Vera Rubin Platform at GTC 2026
March 15, 2026
0:00
1:56
Nvidia is set to unveil a dedicated inference chip built on technology from its twenty billion dollar Groq acquisition at GTC 2026. The company will also launch the Vera Rubin computing platform and NemoClaw, an open-source AI agent framework for enterprises.
Nvidia Takes Aim at the Inference Gap
Nvidia is preparing to make its boldest move yet into the AI inference market, with CEO Jensen Huang expected to unveil a dedicated inference processor at the company's annual GPU Technology Conference in San Jose this week.The chip incorporates technology from Groq, the inference-focused startup Nvidia acquired for twenty billion dollars in late 2025. Groq's Language Processing Units are purpose-built for the rapid token generation that powers AI coding assistants and autonomous agents, an area where Nvidia's training-optimised GPUs have been less competitive.
The Vera Rubin Era Begins
Alongside the inference announcement, Nvidia will formally launch its Vera Rubin platform, the successor to the Blackwell architecture. The platform pairs a new Rubin GPU featuring three hundred and thirty-six billion transistors and HBM4 memory with a custom ARM-based Vera CPU built on eighty-eight Olympus cores.Nvidia claims Rubin delivers up to a ten times reduction in inference token cost and a four times reduction in the number of GPUs needed to train mixture-of-experts models compared to Blackwell. Multiple rack configurations are expected, including NVL72, NVL144, and NVL576.
Software and Geopolitics
Nvidia is also set to announce NemoClaw, an open-source AI agent platform for enterprises. The Apache 2.0-licensed framework will allow companies to deploy autonomous AI agents with built-in security features and will notably be hardware-agnostic, running on competitors' chips as well as Nvidia's own.The conference arrives amid continued uncertainty over US chip export controls toward China, with shipments capped at fifty percent of domestic sales. Nvidia has said it still receives no data centre revenue from China.
Rubin-based products are expected to ship in the second half of 2026.
Published March 15, 2026 at 9:11am