Podcast Episode
The arrangement promises a fivefold increase in high-speed token capacity within the same hardware footprint, and the service will be available exclusively through Amazon Bedrock.
AWS and Cerebras Join Forces to Build the Fastest AI Inference in the Cloud
March 13, 2026
0:00
2:15
Amazon Web Services and Cerebras Systems have announced a collaboration to deploy Cerebras CS-3 systems inside AWS data centres, using a technique called inference disaggregation to split AI workloads between AWS Trainium and the CS-3 chip. The service will launch on Amazon Bedrock in the coming months and promises speeds thousands of times faster than traditional GPU-based alternatives.
A New Approach to AI Speed
Amazon Web Services and Cerebras Systems have announced a landmark collaboration that could reshape how artificial intelligence runs in the cloud. The partnership will see Cerebras CS-3 systems deployed directly inside AWS data centres, paired with Amazon's custom Trainium chips to deliver what both companies describe as the fastest AI inference available.How Inference Disaggregation Works
At the heart of the partnership is a technique called inference disaggregation, which splits AI workloads into two distinct phases handled by purpose-built processors. AWS Trainium tackles the computationally heavy prefill phase, processing the user's input prompt, before handing off to the Cerebras CS-3 for the decode phase, where output tokens are generated at extraordinary speed. The two systems communicate via Amazon's high-speed Elastic Fabric Adapter networking.The arrangement promises a fivefold increase in high-speed token capacity within the same hardware footprint, and the service will be available exclusively through Amazon Bedrock.
A Milestone for Cerebras
The deal represents a major distribution win for Cerebras, whose Wafer-Scale Engine processors contain nine hundred thousand AI cores and four trillion transistors on a single dinner plate-sized chip. The Sunnyvale startup already powers inference for OpenAI, Cognition, and Meta, and signed a ten billion dollar agreement with OpenAI in January. Cerebras closed a one billion dollar funding round in February at a valuation exceeding twenty-two billion dollars and is preparing for an IPO as soon as April, with Morgan Stanley leading the offering.Why Inference Speed Matters Now
The collaboration reflects a broader industry shift as AI workloads move from training to real-time inference. Agentic coding tools now generate roughly fifteen times more tokens per query than conversational chat, driving urgent demand for faster output. AWS will support both the new disaggregated configuration and traditional setups, giving customers flexibility to route workloads as needed.Published March 13, 2026 at 8:26pm