Reinforcement Learning

Sun, 19 Apr 2026 21:47:41 +0000

In machine learning, reinforcement learning (RL) is concerned with how an intelligent agent should take actions in a dynamic environment in order to maximize a reward signal. Reinforcement learning is one of the three basic machine learning paradigms, alongside supervised learning and unsupervised learning. (Wikipedia)

Notation

Symbol	Name	Description
$s_t$	State	The observation/input from the environment at time $t$.
$a_t$	Action	The decision made by the agent at time $t$.
$r_t$	Reward	The feedback signal received after taking an action.
$\pi$	Policy	The agent’s action selection strategy (a mapping from states to actions).
$\gamma$	Discount Factor	A value (0 to 1) that determines how much the agent cares about future rewards or immediate ones.
$T$	Numer of steps	The length of one trajectory.
$G_t$	Return	The total accumulated (and usually discounted) reward from time $t$ onwards.
$V(s)$	Value Function	The expected return starting from state $s$.
$Q(s, a)$	Q-Value	The expected return starting from state $s$ and taking action $a$.
$\theta$	Parameters	The weights of the neural network representing the policy or value function.
$\alpha$	Learning Rate	The step size used when updating the agent’s knowledge (parameters).
$\tau$	Trajectory	A sequence of states, actions, and rewards $(s_0, a_0, r_0, s_1, …)$.
$J(\theta)$	Objective Function	A measure of how good the current policy is (usually the expected total reward).
$\nabla_\theta$	Gradient	The direction and magnitude of the change needed for $\theta$ to increase $J$.
$D$	Dataset	Training dataset for supervised and unsupervised learning.

Basics

In supervised and unsupervised learning, the model is trained on a static dataset to identify underlying patterns. The update signal is derived entirely from the fixed provided data. And, there is no interaction with an external system…

Posts on Tan Ke

Reinforcement Learning

Notation

Basics