Podcast Episode
Nvidia Unveils Groq-Powered LPX Inference Rack at GTC 2026
March 16, 2026
0:00
2:55
Nvidia has officially launched the Groq 3 LPX rack at GTC 2026, a new inference system packing 256 language processing units designed to work alongside its Vera Rubin GPU racks. The liquid-cooled system promises up to 35 times higher inference throughput per megawatt for trillion-parameter AI models.
Nvidia Bets Big on Dedicated Inference Hardware
Nvidia has taken its boldest step yet beyond GPUs, unveiling the Groq 3 LPX rack at its GTC 2026 keynote. The system is the first time the chipmaker has integrated another company's AI silicon into its data centre platform, following its twenty billion dollar deal with AI chip startup Groq in December 2025.How the LPX Rack Works
The liquid-cooled rack houses 256 of Nvidia's new Groq 3 language processing units, each containing five hundred megabytes of on-chip SRAM with one hundred and fifty terabytes per second of bandwidth. At rack scale, the system delivers one hundred and twenty eight gigabytes of SRAM and six hundred and forty terabytes per second of chip-to-chip interconnect bandwidth. Rather than using traditional switches, the architecture uses direct links between processors, enabling deterministic execution where the compiler schedules all computation at compile time.A Complementary Approach
The LPX rack is designed to work alongside Nvidia's Vera Rubin NVL72 GPU racks, not replace them. In practice, the Rubin GPUs handle the compute-intensive attention and prefill operations, while the LPUs accelerate the decode phase, the sequential process of generating output tokens. Together, Nvidia claims the combined system supports trillion-parameter models with million-token context windows and delivers up to thirty five times higher inference throughput per megawatt.The Bigger Picture
CEO Jensen Huang framed the announcement around what he called an inference inflection, noting that agentic AI systems consume up to fifteen times more tokens than traditional applications. The LPX rack is one of seven configurations announced as part of the broader Vera Rubin platform. Initial racks will use Intel processors for inter-chip communication, with Groq chips manufactured at Samsung before transitioning to TSMC. Nvidia is also exploring fusing the LPU with its next-generation Feynman GPU architecture, expected in 2028. Shipments are planned for the second half of 2026.Published March 16, 2026 at 11:17pm