Nick AI Research

[RL basics] Week 1. RL Intro

This page presents the basics of RL. It might be usefull to refresh RL or to explain it to newcommers.

RL framework

RL paradigm

An agent learns from the environment by interacting with it through trials and errors and receives rewards as feedback.

Reward hypothesis

To have the best behavior, we need to maximize the expected cumulative reward.

Discounted cumul reward

Markov property

Agent takes decision using only current state (no memory)

Observation space

Complete (full info)
Partial

Tasks

Episodic (terminal state)
Continuous (no end)

Action Space

Discrete (finite)
Continuous

Exploration exploatation tradeoff

Tradeoff between usual actions and new unknown experience.

👌 Use stochastic policy (probability distribution of actions)

Policy

Which action to take given current state?

Value

Which state has the highest value?

⬅️ Home

➡️ Week 2. Policy

Page updated

Google Sites

Report abuse