In machine learning, reinforcement learning (RL) is concerned with how an intelligent agent should take actions in a dynamic environment in order to maximize a reward signal. Reinforcement learning is one of the three basic machine learning paradigms, alongside supervised learning and unsupervised learning. (Wikipedia)
Notation Symbol Name Description $s_t$ State The observation/input from the environment at time $t$. $a_t$ Action The decision made by the agent at time $t$. $r_t$ Reward The feedback signal received after taking an action. $\pi$ Policy The agent’s action selection strategy (a mapping from states to actions). $\gamma$ Discount Factor A value (0 to 1) that determines how much the agent cares about future rewards or immediate ones. $T$ Numer of steps The length of one trajectory. $G_t$ Return The total accumulated (and usually discounted) reward from time $t$ onwards. $V(s)$ Value Function The expected return starting from state $s$. $Q(s, a)$ Q-Value The expected return starting from state $s$ and taking action $a$. $\theta$ Parameters The weights of the neural network representing the policy or value function. $\alpha$ Learning Rate The step size used when updating the agent’s knowledge (parameters). $\tau$ Trajectory A sequence of states, actions, and rewards $(s_0, a_0, r_0, s_1, …)$. $J(\theta)$ Objective Function A measure of how good the current policy is (usually the expected total reward). $\nabla_\theta$ Gradient The direction and magnitude of the change needed for $\theta$ to increase $J$. $D$ Dataset Training dataset for supervised and unsupervised learning. Basics In supervised and unsupervised learning, the model is trained on a static dataset to identify underlying patterns. The update signal is derived entirely from the fixed provided data. And, there is no interaction with an external system…
...