Podcast Episode
"The big bang of physical AI is just around the corner thanks to breakthroughs in multimodal reasoning language, vision and world models," Huang said, framing Cosmos 3 as a generational leap for anyone building robots, autonomous vehicles, and vision AI.
Alongside the launch, Nvidia announced the Cosmos Coalition, a collaboration with Agile Robots, Black Forest Labs, Generalist, LTX, Runway, and Skild AI to advance open world models. Companies already building on the platform include Samsung, LG Electronics, Doosan Robotics, and Li Auto.
Nvidia Launches Cosmos 3, the World's First Fully Open Omnimodel for Physical AI
June 1, 2026
0:00
5:04
Nvidia has unveiled Cosmos 3 at GTC Taipei, calling it the world's first fully open omnimodel for physical AI. The single system combines vision reasoning, world generation, and action prediction to help robots and autonomous vehicles perceive and act in the real world. It ships in 8-billion and 32-billion-parameter sizes, with a Cosmos Coalition of industry partners forming around it.
A Single Brain for Machines That Think and Move
Nvidia has taken the wraps off Cosmos 3, describing it as the world's first fully open omnimodel for physical AI. Announced by founder and CEO Jensen Huang during his GTC Taipei keynote, the model fuses three capabilities that developers have traditionally had to stitch together from separate systems: vision reasoning, world generation, and action prediction. The result is a single foundation designed to let robots and autonomous vehicles perceive their surroundings, reason about them, plan a response, and act."The big bang of physical AI is just around the corner thanks to breakthroughs in multimodal reasoning language, vision and world models," Huang said, framing Cosmos 3 as a generational leap for anyone building robots, autonomous vehicles, and vision AI.
How the Architecture Works
Cosmos 3 is built on a mixture-of-transformers architecture that pairs a reasoning transformer with an expert generation transformer. That split lets the model understand object interactions, motion, and spatial-temporal relationships before it generates video and action trajectories. Crucially, it can natively process and produce text, images, video, ambient sound, and actions, removing the need for developers to juggle a different model for each modality.Three Sizes for Three Jobs
The release arrives in two model sizes at launch, with a third on the way. Cosmos 3 Nano is an 8-billion-parameter version tuned to run on workstation-grade hardware such as the RTX PRO 6000 GPU. Cosmos 3 Super is a 32-billion-parameter model built for large-scale synthetic data generation on Hopper and Blackwell GPUs. A forthcoming variant, Cosmos 3 Edge, is aimed at real-time inference at the edge.Open Models and a New Coalition
In a notable move, Nvidia is open-sourcing the models, the post-training scripts, and the synthetic data generation datasets, distributing them through Hugging Face and GitHub. Developers can also run the models as Nvidia NIM microservices or reach them through cloud partners including Microsoft Azure, CoreWeave, and Nebius.Alongside the launch, Nvidia announced the Cosmos Coalition, a collaboration with Agile Robots, Black Forest Labs, Generalist, LTX, Runway, and Skild AI to advance open world models. Companies already building on the platform include Samsung, LG Electronics, Doosan Robotics, and Li Auto.
Topping the Benchmarks
Among open models, Nvidia says Cosmos 3 ranks first across multiple physical AI benchmarks, spanning world generation accuracy, action policy, and vision understanding. The company claims the model can shrink physical AI training cycles from months to days by offering a pretrained foundation that needs less data and lower training costs, a pitch aimed squarely at lowering the barrier to building capable machines.Published June 1, 2026 at 8:32pm