Posts

π Series (π₀, π₀.₅)

Physical Intelligence is a fast-rising company focused on bringing general-purpose AI into the physical world. In under two years since introducing their first VLA prototype model π₀ , thet’ve made a huge impact in the embodied intelligence community. In this post, I’ll walk through the three main VLA models they’ve released so far, based on my reading of their blogs and papers. π₀ π₀ is a vision-language-action (VLA) model built on top of a pre-trained vision–language model (VLM) backbone. It is then robot-pretrained on a large mixture of open-source and in-house manipulation datasets to learn broad, general skills, and can be further post-trained on smaller, task-specific data to specialize for downstream applications. ...

GPU and CUDA

In this post, I’ll walk through GPUs and CUDA. Hope it helps with my final exam and AI learning… The full name of GPU is Graphics Processing Unit. Looking back at its history. GPU first appeared as fixed-function hardware to speed up parallel work in real-time 3D graphics. Over time, GPUs became more programmable. By 2003, parts of the graphics pipeline were fully programmable, running custom code in parallel for many elements of a 3D scene or an image. ...

Optimization in Machine Learning

The summary of the seminar “Optimization in Machine Learning”, covering Bayesian Optimization, multi-fidelity methods, handling discrete search spaces, and the BANANAS method for NAS.

BANANAS: Bayesian Optimization with Neural Architectures for Neural Architecture Search

Paper-reading notes: BANANAS

UrbanLF: A Comprehensive Light Field Dataset for Semantic Segmentation of Urban Scenes

Paper-reading notes: UrbanLF

Large Concept Models: Language Modeling in a Sentence Representation Space

Paper-reading notes: Large Concept Models: Language Modeling in a Sentence Representation Space

From Tokens To Thoughts: How LLMs And Humans Trade Compression For Meaning

Paper-reading notes: From Tokens To Thoughts: How LLMs And Humans Trade Compression For Meaning

RT Series (RT-1, RT-2)

Paper-reading notes: RT-1 and RT-2

Learning Transferable Visual Models From Natural Language Supervision

Paper-reading notes: CLIP

Diffusion Policy: Visuomotor Policy Learning via Action Diffusion

Paper-reading notes: Diffusion Policy