Attention

π Series (π₀, π₀.₅)

Physical Intelligence is a fast-rising company focused on bringing general-purpose AI into the physical world. In under two years since introducing their first VLA prototype model π₀ , thet’ve made a huge impact in the embodied intelligence community. In this post, I’ll walk through the three main VLA models they’ve released so far, based on my reading of their blogs and papers. π₀ π₀ is a vision-language-action (VLA) model built on top of a pre-trained vision–language model (VLM) backbone. It is then robot-pretrained on a large mixture of open-source and in-house manipulation datasets to learn broad, general skills, and can be further post-trained on smaller, task-specific data to specialize for downstream applications. ...

Large Concept Models: Language Modeling in a Sentence Representation Space

Paper-reading notes: Large Concept Models: Language Modeling in a Sentence Representation Space

Synthesizer: Rethinking Self-Attention for Transformer Models

Paper-reading notes: Synthesizer

Learning Transformer Programs

Paper-reading notes: Learning Transformer Programs

Reformer: The Efficient Transformer

Paper-reading notes: Reformer

ALTA: Compiler-Based Analysis of Transformers

Paper-reading notes: ALTA

Tracr: Compiled Transformers as a Laboratory for Interpretability

Paper-reading notes: Tracr

Thinking Like Transformers

Paper-reading notes: RASP

FNet: Mixing Tokens with Fourier Transforms

Paper-reading notes: FNet

Linformer: Self-Attention with Linear Complexity

Paper-reading notes: Linformer