← Back to Hub
Error loading visualization
pi0 VLM Backbone — PaliGemma
VB
VLM Backbone — PaliGemma
pi0 Architecture — Step 2
Step
-
/ 7
VLM Parameters
Model
PaliGemma-3B
LLM Backbone
Gemma-2B
Vision Encoder
SigLIP-So400M
Embed Dim
2048
Decoder Layers
18
Attention Heads
16
Weights
Frozen
Processing Progress
Token Sequence (23 shown)
Legend
Text Prompt / Language
PaliGemma Decoder
Visual Tokens (SigLIP)
Cross-Attention Beams
Data Flow
Fused Embeddings
Ready
Press Play to see how PaliGemma combines visual tokens from SigLIP with language tokens to create fused vision-language embeddings — Step 2 of the pi0 architecture.
Reset
Play
Step
Speed
1x