PhysDBPhysical AI Map

Models

Vision-language-action models

Models that connect visual observations and language instructions to robot actions.

vlapolicyfoundation-model

What it is

RT-2, Gemini Robotics, PI0, and GR00T-style systems all sit near this node, though their architectures and release boundaries differ.

Why it matters

VLA models make robot policy learning look more like multimodal foundation modeling, but they still depend on robot data and action interfaces.

How not to overread it

A VLA model is not a full robot product or safety case.

Related edges

contains

Physical AI

Physical AI model taxonomy

VLA models are components, not whole robots.

instantiates

Robot policy

Robot action generation

Policy behavior is embodiment-specific.

trained by

Robot data

Robot learning

Web pretraining does not replace robot action data.

overlaps with

Embodied reasoning

Task interpretation

Reasoning and action output may be separated in some systems.

evaluates

Benchmarks

Model comparison

Benchmark scores are not deployment readiness.