Robot world models draw $6 billion but lack a shared foundation

22 June 2026, 16:58·1 min read

World model companies drew approximately $6 billion in Q1 2026 alone, fueled by the idea that physical AI can follow the path transformers created for language models. Fusion Fund investors Charlotte Xia and Matt Wong argue the analogy is incomplete because robots lack the equivalent of a universal text token: joint angles, point clouds, force readings, video streams and action spaces vary across bodies and sensors.

NVIDIA’s Cosmos 3 launch and DreamZero results gave the thesis a concrete boost. Cosmos 3 is described as an open omnimodel trained on 20 trillion tokens of multimodal data, while a 14-billion-parameter DreamZero variant achieved a 2x improvement over VLA baselines on zero-shot generalization, reaching 62% average task progress versus 27% for pretrained VLA baselines.

The field remains split among pixel-based, explicit 3D geometric and latent representations, with no clear winner. World Labs has raised $1 billion around explicit 3D work, AMI Labs raised $1.03 billion around latent models and Physical Intelligence raised $600 million around VLA approaches, reinforcing architectural divergence.

The analysis points to shared evaluation, data curation and full-stack vertical deployment as durable opportunities. Until robotics finds a common measurement layer or a physical equivalent of tokens, world models may advance quickly without converging around a single architecture.

Originally reported by techtimes.comRead the source →

Related coverage

Infrastructure

Robot world models draw $6 billion but lack a shared foundation

NVIDIA shifts AI server cooling to hotter liquid loops

JUPITER highlights exascale science at ISC

Nvidia faces a more credible benchmark fight

NVIDIA adds tools for faster scientific AI workflows