[RL basics] Week 1. RL Intro

This page presents the basics of RL. It might be usefull to refresh RL or to explain it to newcommers.

RL framework

RL paradigm

An agent learns from the environment by interacting with it through trials and errors and receives rewards as feedback.

Reward hypothesis

To have the best behavior, we need to maximize the expected cumulative reward.

Discounted cumul reward

Markov property

Agent takes decision using only current state (no memory)

Observation space

  • Complete (full info)

  • Partial

Tasks

  • Episodic (terminal state)

  • Continuous (no end)

Action Space

  • Discrete (finite)

  • Continuous

Exploration exploatation tradeoff

Tradeoff between usual actions and new unknown experience.

👌 Use stochastic policy (probability distribution of actions)

Policy

Which action to take given current state?

Value

Which state has the highest value?