[RL basics] Week 1. RL Intro
This page presents the basics of RL. It might be usefull to refresh RL or to explain it to newcommers.
RL framework
RL framework
RL paradigm
RL paradigm
An agent learns from the environment by interacting with it through trials and errors and receives rewards as feedback.
Reward hypothesis
Reward hypothesis
To have the best behavior, we need to maximize the expected cumulative reward.
Discounted cumul reward
Discounted cumul reward
Markov property
Markov property
Agent takes decision using only current state (no memory)
Observation space
Observation space
Complete (full info)
Partial
Tasks
Tasks
Episodic (terminal state)
Continuous (no end)
Action Space
Action Space
Discrete (finite)
Continuous
Exploration exploatation tradeoff
Exploration exploatation tradeoff
Tradeoff between usual actions and new unknown experience.
👌 Use stochastic policy (probability distribution of actions)
Policy
Policy
Which action to take given current state?
Value
Value
Which state has the highest value?