π Series (π₀, π₀.₅)

Physical Intelligence is a fast-rising company focused on bringing general-purpose AI into the physical world. In under two years since introducing their first VLA prototype model π₀ , thet’ve made a huge impact in the embodied intelligence community. In this post, I’ll walk through the three main VLA models they’ve released so far, based on my reading of their blogs and papers. π₀ π₀ is a vision-language-action (VLA) model built on top of a pre-trained vision–language model (VLM) backbone. It is then robot-pretrained on a large mixture of open-source and in-house manipulation datasets to learn broad, general skills, and can be further post-trained on smaller, task-specific data to specialize for downstream applications. ...

March 1, 2026 | 2621 words | Author: Tan Ke

GPU and CUDA

In this post, I’ll walk through GPUs and CUDA. Hope it helps with my final exam and AI learning… The full name of GPU is Graphics Processing Unit. Looking back at its history. GPU first appeared as fixed-function hardware to speed up parallel work in real-time 3D graphics. Over time, GPUs became more programmable. By 2003, parts of the graphics pipeline were fully programmable, running custom code in parallel for many elements of a 3D scene or an image. ...

February 22, 2026 | 2607 words | Author: Tan Ke

Optimization in Machine Learning

The summary of the seminar “Optimization in Machine Learning”, covering Bayesian Optimization, multi-fidelity methods, handling discrete search spaces, and the BANANAS method for NAS.
February 10, 2026 | 2443 words | Author: Tan Ke

BANANAS: Bayesian Optimization with Neural Architectures for Neural Architecture Search

Paper-reading notes: BANANAS
February 5, 2026 | 329 words | Author: Tan Ke

UrbanLF: A Comprehensive Light Field Dataset for Semantic Segmentation of Urban Scenes

Paper-reading notes: UrbanLF
January 17, 2026 | 432 words | Author: Tan Ke

Large Concept Models: Language Modeling in a Sentence Representation Space

Paper-reading notes: Large Concept Models: Language Modeling in a Sentence Representation Space
January 15, 2026 | 3217 words | Author: Tan Ke

From Tokens To Thoughts: How LLMs And Humans Trade Compression For Meaning

Paper-reading notes: From Tokens To Thoughts: How LLMs And Humans Trade Compression For Meaning
January 12, 2026 | 913 words | Author: Tan Ke

RT Series (RT-1, RT-2)

Paper-reading notes: RT-1 and RT-2
January 9, 2026 | 1740 words | Author: Tan Ke

Learning Transferable Visual Models From Natural Language Supervision

Paper-reading notes: CLIP
January 1, 2026 | 888 words | Author: Tan Ke

Diffusion Policy: Visuomotor Policy Learning via Action Diffusion

Paper-reading notes: Diffusion Policy
December 28, 2025 | 672 words | Author: Tan Ke