Reinforcement Learning

In machine learning, reinforcement learning (RL) is concerned with how an intelligent agent should take actions in a dynamic environment in order to maximize a reward signal. Reinforcement learning is one of the three basic machine learning paradigms, alongside supervised learning and unsupervised learning. (Wikipedia) Notation Symbol Name Description $s_t$ State The observation/input from the environment at time $t$. $a_t$ Action The decision made by the agent at time $t$. $r_t$ Reward The feedback signal received after taking an action. $\pi$ Policy The agent’s action selection strategy (a mapping from states to actions). $\gamma$ Discount Factor A value (0 to 1) that determines how much the agent cares about future rewards or immediate ones. $T$ Numer of steps The length of one trajectory. $G_t$ Return The total accumulated (and usually discounted) reward from time $t$ onwards. $V(s)$ Value Function The expected return starting from state $s$. $Q(s, a)$ Q-Value The expected return starting from state $s$ and taking action $a$. $\theta$ Parameters The weights of the neural network representing the policy or value function. $\alpha$ Learning Rate The step size used when updating the agent’s knowledge (parameters). $\tau$ Trajectory A sequence of states, actions, and rewards $(s_0, a_0, r_0, s_1, …)$. $J(\theta)$ Objective Function A measure of how good the current policy is (usually the expected total reward). $\nabla_\theta$ Gradient The direction and magnitude of the change needed for $\theta$ to increase $J$. $D$ Dataset Training dataset for supervised and unsupervised learning. Basics In supervised and unsupervised learning, the model is trained on a static dataset to identify underlying patterns. The update signal is derived entirely from the fixed provided data. And, there is no interaction with an external system… ...

April 19, 2026 | 2076 words | Author: Tan Ke

GPU

In this post, I’ll walk through GPUs and CUDA. Hope it helps with my final exam and AI learning… The full name of GPU is Graphics Processing Unit. Looking back at its history. GPU first appeared as fixed-function hardware to speed up parallel work in real-time 3D graphics. Over time, GPUs became more programmable. By 2003, parts of the graphics pipeline were fully programmable, running custom code in parallel for many elements of a 3D scene or an image. ...

April 16, 2026 | 5144 words | Author: Tan Ke

Scenestreamer: Continuous Scenario Generation As Next Token Group Prediction

Paper-reading notes: Scenestreamer Continuous Scenario Generation As Next Token Group Prediction
March 31, 2026 | 2234 words | Author: Tan Ke

InSpatio-WorldFM: An Open-Source Real-Time Generative Frame Model

Paper-reading notes: InSpatio-WorldFM
March 24, 2026 | 775 words | Author: Tan Ke

Evolution and Ablation of Robotic World Models

World Models Paper | Homepage There is an interactive loop between the agent and the environment. The agent observes the environment, takes an action in response, and then the environment changes accordingly. The agent model can be viewed as the brain of the agent: it is the overall decision-making system that enables the agent to perceive the environment, maintain temporal context, and choose actions. An typical agent model has three components, three models: ...

March 22, 2026 | 5346 words | Author: Tan Ke

A Review of Robbyant’s Early-2026 Work

Robbyant is a company under Ant Group, dedicated to building the foundational platform for Embodied AI, bridging the gap between digital intelligence and the physical world. Since the company is still relatively new, I want to quickly review its recent work. In particular, I will study four embodied intelligence model models: spatial perception model, VLA model, world model, and video action model. This diagram in the homepage of Robbyant reflects the vision for embodied intelligence: starting from sensory input, the system first builds spatial intelligence to understand the physical world, then relies on an action model to make decisions and interact with the environment, and finally improves through environmental reward. ...

March 16, 2026 | 3594 words | Author: Tan Ke

π Series (π₀, π₀.₅)

Physical Intelligence is a fast-rising company focused on bringing general-purpose AI into the physical world. In under two years since introducing their first VLA prototype model π₀, thet’ve made a huge impact in the embodied intelligence community. In this post, I’ll walk through the three main VLA models they’ve released so far, based on my reading of their blogs and papers. π₀ π₀ is a vision-language-action (VLA) model built on top of a pre-trained vision–language model (VLM) backbone. It is then robot-pretrained on a large mixture of open-source and in-house manipulation datasets to learn broad, general skills, and can be further post-trained on smaller, task-specific data to specialize for downstream applications. ...

March 1, 2026 | 2611 words | Author: Tan Ke

Optimization in Machine Learning

The summary of the seminar “Optimization in Machine Learning”, covering Bayesian Optimization, multi-fidelity methods, handling discrete search spaces, and the BANANAS method for NAS.
February 10, 2026 | 2443 words | Author: Tan Ke

BANANAS: Bayesian Optimization with Neural Architectures for Neural Architecture Search

Paper-reading notes: BANANAS
February 5, 2026 | 329 words | Author: Tan Ke

UrbanLF: A Comprehensive Light Field Dataset for Semantic Segmentation of Urban Scenes

Paper-reading notes: UrbanLF
January 17, 2026 | 432 words | Author: Tan Ke